Chapter 16

XML-Derived Markup Languages

16.1 Introduction

Throughout this book so far, we’ve discussed XML itself, the more common ways in which XML vocabularies are defined, and – predominately – how you can query XML documents using a variety of approaches, including XPath, XQuery, and SQL/XML.

In this chapter, we take a look at a number of specialized XML languages1 and express a few thoughts about the ways in which documents built using those vocabularies might be queried. In particular, we’ll see that at least some of these vocabularies might be more appropriately queried using languages other than those we’ve featured in this book.

We also look at the subject of how to discover things – such as web services, businesses from which you can purchase or to which you can sell, and the like – on the web, using XML vocabularies, exploring how you can query that information to get what you need.

16.2 Markup Languages

XML, being the extensible markup language, serves as the definitional paradigm for a large set of specialized markup languages (many of which are no more than vocabularies). Many different industries, scientific disciplines, individual organizations, and even individuals have designed and published the specifications for XML vocabularies suitable for specific needs. In Chapter 1, “XML,” we speculated that we could create an XML-based markup language called MDL (Movie Definition Language) for managing our movie collections. In fact, the XML Schema published in Appendix A describes the XML document containing our movie data. That could quite reasonably be called the definition of MDL’s vocabulary. Almost any DTD or XML Schema falls into the same category, whether or not the authors of the DTD or Schema make that claim.

There are undoubtedly hundreds – probably thousands – of specialized XML languages that have been defined for one purpose or another. In this section, we take a look at only a few of the markup languages (MathML, SMIL, SVG) that have achieved some level of standardization and acceptance. Our choices are not actually random, but they do represent a variety of communities. In fact, this section looks at some relatively esoteric markup languages specifically because many of the more commonly used languages have been discussed elsewhere in this book.

We illustrate in this chapter that general-purpose query languages (e.g., XQuery) readily query documents of any XML vocabulary, but that the query authors must apply the semantics of the markup language. Special-purpose query languages could be defined for each markup language in which the semantics of those languages are recognized in the query languages themselves; however, this is rarely done.

One such markup language (not further discussed in this chapter) is XBRL, the Extended Business Reporting Language.2 The United States Securities and Exchange Commission announced3 in late 2004 a new program encouraging businesses to voluntarily submit supplemental financial information marked up using XBRL for posting on EDGAR, the Commission’s Electronic Data Gathering, Analysis, and Retrieval System (an online financial information database accessible to the public). We believe that the U.S. Government will eventually require reporting of such information in an XML-based vocabulary, probably XBRL or some derivative vocabulary. Naturally, there will be significant interest in querying financial data reported using XBRL!

16.2.1 MathML

MathML4 is, according to the specification’s abstract, “an XML application for describing mathematical notation and capturing both its structure and content” and its goal is “to enable mathematics to be served, received, and processed on the World Wide Web, just as HTML has enabled this functionality for text.” It is quite beyond the scope of this book to give a detailed description of MathML, so we’ve limited ourselves to a bit of introduction and a couple of examples of MathML use.

Until MathML was defined, the principal system used for formatting mathematical notations was Donald Knuth’s TeX.5 Put charitably, although TeX probably produces the best-looking computer-generated mathematical notations in print, the language is rather difficult for most people to use. MathML has not taken the lead from TeX, but it certainly represents a major step forward both in the use of computers to typeset mathematical notations and in the generation and accessibility of marked-up material.

MathML is intended to be used both for mathematical notation and mathematical content in web (or hardcopy) documents. The specification defines about 180 different elements, some 30 of which are used to describe “abstract notational structures” and the rest to “unambiguously [specify] the intended meaning of an expression.”

There are three broad categories of MathML elements. One, presentation markup, is used to describe the two-dimensional layout and presentation of mathematical expressions. A second, content markup, provides access to the semantics of those expressions; there is clearly a relationship between mathematical notation and the underlying semantics, but there are also obvious differences. For example, stating that a digit is to be set as a superscript immediately following a letter is significantly different from specifying that the value represented by that letter is to be raised to the power indicated by the digit. The third category of MathML elements provides the MathML interface – these are used primarily when MathML expressions are embedded into other markup languages, such as HTML, and we do not discuss them further.

Example 16-1 and Example 16-2 illustrate two different ways of marking up the equation seen in Equation 16-1.

Equation 16-1   Example Equation

image

Example 16-1   Equation Marked Up Using MathML Presentation Markup

image

The <mrow> element indicates that its content is intended to be a single horizontal “row” (in this context, a row is nothing more than a horizontally related group of objects), <msup> signals a superscripted expression, <fenced> surrounds the expression it contains with parentheses, and <mi>, <mn>, and <mo> encapsulate identifiers, numbers, and operators, respectively.

Example 16-2   Equation Marked Up Using MathML Content Markup

image

image

As before, the <mrow> element indicates that its content is intended to be a single horizontal “row,” but that’s where the similarities end. <apply> represents an operation to be applied to an expression, <power> and <plus> specify the operations to be applied, and <ci> and <cn> represent identifiers and numbers, respectively.

MathML processors are allowed to use either of these markup conventions (presentation or content), and might generate the same visual results for both. In general, however, the content markup approach makes it easier for other applications to consume the marked up expressions and act on them semantically. For example, a calculator application could consume the content markup of that equation, request values for the two variables a and b, and then perform the appropriate calculations.

It’s also possible to combine the two forms of markup for a single mathematical expression, as indicated in Example 16-3. This sort of combination allows you to specify the form in which you’d like the expression to be presented, while providing the semantics of the expression as an annotation that doesn’t affect the presentation.

Example 16-3   Equation Marked Up with Both Presentation and Content Markup

image

image

Now, as you have seen from these examples, MathML is obviously “just another” form of XML. As such, it’s certainly easy to imagine querying it using XPath or XQuery. For example, if we wanted to find whether there are any square (or cube or higher-order) expressions in Example 16-2 and, if so, what the operands of those expressions are, we might write the XPath expression found in Example 16-4.

Example 16-4   An XPath Expression Applied to a MathML Expression

image

Results:

image

The value returned by that expression contains any and all following siblings of any and all power elements in the XPath context. Of course, if there were no power elements, the expression would return the empty sequence. But, if there were multiple power elements, the expression would return the following siblings of all of them without any obvious way of determining which elements were associated with which power element (other than the fact that they’d be in document order, that is).

Note that the simple XPath expression in Example 16-4 didn’t really have anything to do with the fact that the power element is used to represent mathematical squares, nor that the values returned indicate that the result is indeed the square (and not the cube) of another expression. Sure, XPath (and XQuery) can be used to query MathML expressions of this sort, but there is no mathematical knowledge implied by such queries.

It might be nice to design some sort of special-purpose MathML-inspired query language that would allow us to ask a question such as “for each power(2), return MathML-query(contents)” or (more generally) questions such as “what formulae in this document make use of both cube roots and integration?” Would such a language be useful? Probably not too often, but it might sometimes be helpful to mathematicians working with collections of documents marked up in MathML. Although the MathML specification refers to the importance of “automatic searching and indexing” of mathematical documents, it does not otherwise mention ways (or languages) by which such searching might be done. We doubt that there is sufficient demand to justify design and implementation of a special-purpose query language just for this purpose. If ever created, such a language might be represented simply as an application built using XQuery as its foundation, transforming the kinds of questions we’ve asked into XQuery (or XPath) expressions.

16.2.2 SMIL

SMIL6 (Synchronized Multimedia Integration Language) is a markup language published by the W3C to integrate “a set of independent multimedia objects into a synchronized multimedia presentation.” Such presentations might include animations, audio, video, text, and other forms of information. The language allows the specification of a presentation’s temporal behavior (that is, what events happen when), the layout of a presentation (meaning what information appears where on a visual display), and hyperlinks to various media objects (such as a video file published by another author).

The syntax of SMIL is defined normatively by means of a series of DTDs. It is also defined (but not normatively) by means of a series of SMIL Schemas. There is more than one DTD (and more than one Schema) because the SMIL specification defines the language in a very modular way, allowing implementations to choose which of various modules to implement. Each module is defined by a single XML Schema or DTD. At least one SMIL profile (precise identification of a set of features specified in a standard) appears normatively as a DTD – and informatively as an XML Schema – in the SMIL Recommendation.

A SMIL “document” is a single <smil> element that contains either a <head> element, a <body> element, or both. The <head> element contains any layout specifications and may use either the SMIL-defined <layout> element or the facilities of CSS2,7 it may also contain information about the presentation (such as the title, creation date, keywords, and so forth).

The <layout> element “determines how the elements in the document’s body are positioned on an abstract rendering surface (either visual or acoustic).” If the element is omitted, then the layout is determined by the SMIL implementation. SMIL’s <layout> element is used only to control the layout of the media object elements that are identified in the <body> element (the layout of all other <layout> elements must be specified using CSS2). For example, a <layout> element may contain a <region> element that defines a part of the “rendering surface” by name, giving that region an absolute position relative to the position of the <smil> element itself. The <region> element has a number of attributes, all optional. These attributes allow the specification of such characteristics as background color, scaling (or fit) of an object within the region, and the position of the region.

The <body> element is the heart of SMIL. It provides “information that is related to the temporal and linking behavior of the document” and can contain any of 11 kinds of child element. Among these are a <par> element whose children identify objects that are allowed to overlap in time (we suspect that “par” is intended to evoke “parallel”), and a <seq> element whose children identify objects that form a temporal sequence.

Those <par> and <seq> elements have precisely the same possible child elements as the <body> element. This implies, of course, that objects participating in an overlapping presentation can themselves be overlapping or sequential presentations of other objects.

Other children of the <body> element (as well as the <par> and <seq> elements) include <animation>, <audio>, <img>, <text>, <textstream>, and <video>. Some of those elements (<animation>, <audio>, and <video>) specify objects, called continuous media, that have an inherent duration. Others (<img>, <text>, and <textstream>) specify objects without an intrinsic duration; these are called discrete media. The attributes of the media elements allow specification of such characteristics as: the length of time after the object’s container is activated before the object itself is activated, the length of time during which the object is activated, the name of the region in which the object is to be activated, and captions associated with the object.

Every SMIL object appears in a four-dimensional space: horizontal, vertical, depth (the z-axis, representing depth, is supported to govern physically overlapping objects), and time. The elements representing those objects can be given attributes that determine whether or not the object appears at all, based on various tests: for example, if the SMIL implementation can determine that the environment has no sound facilities, then no <audio> elements should cause sound presentations to be emitted from the implementation. It might be reasonable for applications dealing with SMIL presentations to ask questions such as “Are there any images that are partially obscured by text while the theme song from Shaft is playing?” “What physically adjacent objects visible at the same time (wholly or partly) have complementary background colors?” “Does the video of President Kennedy terminate before or after the map of Vietnam appears on the screen?” or “Where on the display can I add my Easter bunny animation, starting seven minutes into the presentation and lasting for 30 seconds, without covering up the photograph of my youngest child?”

SMIL is a language designed explicitly for multimedia authors and is rapidly gaining acceptance within that community. Authoring tools for SMIL-based media are already emerging, but the need for broad, flexible search capabilities has not yet been recognized by most of the community using SMIL. But the reason that multimedia productions would be represented in SMIL is so that they can be processed by a variety of processors, all of which interpret the SMIL vocabulary, including manipulation (editing, for example) of the productions and searching (querying) productions.

We are not surprised that we’ve found no evidence of a special-purpose query language designed to support queries that answer questions such as these. Writing XPath or XQuery expressions to answer them is certainly possible, but would undoubtedly be tedious because of the level of detailed knowledge of a SMIL document’s structure that would be required. On the other hand, defining XQuery and XPath function libraries that support queries on SMIL documents seems like a reasonable approach to providing the capabilities of a purpose-built SMIL query language. We would expect that, for something as complex as multimedia documents, end users would have to be provided with some sort of visual (GUI) tool that would generate the XQuery expressions and function invocations.

16.2.3 SVG

SVG8 (Scalable Vector Graphics) is an XML markup language, used for representing graphics that can be displayed over the web, in print, etc., with whatever resolution the destination medium can support (that’s the “scalable” part of the name). The graphics that are created with SVG are inherently “two-dimensional vector and mixed vector/raster graphics” in nature. The vocabulary supports three types of objects: vector graphic shapes (that is, line drawings), images (also known as raster graphics), and text. The graphics can be interactive (responding to an event like a mouse click) and dynamic (changing the image as time passes), the latter implying animation.

The value in having an XML vocabulary to represent such graphics lies in the ability to abstract the specification of the graphics from the sequences of bits used to draw them onto some medium, whether paper or video monitor. Earlier ways of encoding this sort of graphic tended to depend on much less flexible notations, often binary notations that were difficult to interpret in the absence of an explicitly provided processor for each specific kind of graphic. SVG, on the other hand, can be “read” by humans who can get at least a sense of the graphic.

SVG shares a number of characteristics with, and even incorporates a few features from, SMIL (see Section 16.2.2). For example, SVG takes SMIL’s animation feature and extends it. Furthermore, SVG was designed to be used to create both dynamic and static objects for SMIL.

An SVG image can be used “stand-alone” as a web page by itself, it can be linked (or referenced) from an XHTML or other XML document, and it can be embedded in another SVG image. Although the SVG specification defines a large number of elements and attributes, the language has a simplicity to it that makes it relatively easy to grasp. For example, consider the first shape specified in Example 16-5.

Example 16-5   Simple SVG Images

image

The first shape includes four different rectangles, of different sizes, with another rectangle drawn as a narrow blue line at the borders of the image area.

We observed above that one SVG graphic can be embedded within another one. The second shape in Example 16-5 illustrates that possibility (but observe that the nested graphic is enclosed in a <g> element). Figure 16-1 shows the results of evaluating the first two SVG drawings in Example 16-5.

image

Figure 16-1 The Visual Results: Two SVG Images.

What sorts of questions might one want to ask about SVG graphics? In some ways, they are likely to be similar to those asked about SMIL documents. Due to the greater imaging power of SVG, the questions might ask about characteristics not directly available in SMIL. For example, we might want to know the width of every blue rectangle resting at an angle with the right end higher than the left, whose longer dimension is the dimension closest to horizontal. Or we might need to learn whether there are any raster images that would ever partly or wholly obscure a vector image such that both images are located within an animation that slides images around in the viewing area.

Overall, the questions are sufficiently similar to those we might ask in a SMIL context, so we believe that the situation is likely to be the same: No special-purpose query language exists to support queries that answer questions such as these; and writing XPath or XQuery expressions to answer them is possible, but probably tedious. A possible XPath expression to answer the question about the blue rectangle (whose SVG expression is the third example in Example 16-5 and whose visual representation is in Figure 16-2) can be found in Example 16-6. (Because of production limitations, we regret that the viewport outline and rectangle cannot be in glorious color.)

image

Figure 16-2 The Visual Result: A Slanted Blue Rectangle.

Example 16-6   Finding the Width of the Blue Rectangle

image

Results:

image

Not exactly obvious, perhaps, but it is usable – at least as long as the questions don’t get too complex. Would a specialized query language for SVG be useful? Perhaps, but (as for SMIL) we are unaware of any efforts to produce one.

Through examples like the ones we’ve provided in Section 16.2, we have concluded that there is great value in defining specialized markup languages for specific domains, but that the cost of defining special-purpose query languages for most of those markup languages would exceed the value they’d provide. XQuery and XPath, on the other hand, can be used to perform meaningful queries on all of them – even though the expressions required to express such queries are likely to be cumbersome in many cases.

16.3 Discovery on the World Wide Web

The World Wide Web is – well, big. Estimates of its size range to billions of pages. In all those pages, we wouldn’t be surprised if most of human knowledge were captured – somewhere. Finding the knowledge in which we’re interested at any given moment is a challenge that is addressed by search engines such as Google, AltaVista, Yahoo, and AskJeeves, as well as metasearch engines like Dogpile and Mamma. (In Chapter 18, “Finding Stuff,” you’ll read more about querying and search engine technology.)

Search engines, which seem to improve almost daily, work pretty well for use by humans sitting in front of a computer, formulating queries that the search engines evaluate. (We don’t address such queries in this book, largely because they are directed toward HTML and text rather than XML, but also because the syntaxes they use are generally engine-specific.)

But when application software is the initiator of a search on the web, ordinary search engine technology is rarely appropriate. This is due at least in part because applications are rather unlikely to be curious about the latest football scores, catching up on celebrity gossip, researching SVG for inclusion in a book, or tracking down a support group for a rare disease. Instead, applications are more likely to be looking for purveyors of services – often called web services – required by the application doing the search. (Of course, the purpose of some of those web services may well be the offering of sports scores, titillation, and self-help!)

In fact, many web publishers, especially in the news business, are already seeing a shift in the use of their sites from eyeballs to applications. Such publishers are beginning to redesign their sites to cater directly to applications’ use of the data they provide – or, in some cases, to fend off such applications’ direct consumption of their data! Offering content as a web service (something already being done by Google and others) is the logical extension of this trend, and a potentially important business model for such publishers.

The organization or individual offering a web service obviously has to somehow publish the information about that service. For the service to be most accessible, the published information should be available directly to applications and not require the intervention of a person. This implies that the information for all web services should be published in a standardized format – and that quickly suggests a standardized XML language.

In fact, the term web service is defined by a W3C architecture document9 this way:

A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-process-able format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards.

Of course, merely having a standardized vocabulary used for publishing information about your web service doesn’t solve the entire problem – you’ve also got to ensure that applications that might be interested in your web service know how to find that information you’ve published. We should note that publication of a web service doesn’t necessarily mean that it can be found only through searches; indeed, businesses may simply exchange their web service publications and use the information for prearranged business-to-business activities. This section focuses primarily on the standardized XML vocabulary for web services (discussed at greater length in Chapter 18, “Finding Stuff”), but we also glance briefly at the way in which that information about your service can be located.

There are several approaches to publishing standardized information about web services either in use today or being designed for use in the near future. Two of the most important are the W3C’s WSDL (Web Services Description Language) and UDDI10 (Universal Description Discovery & Integration) from Oasis. WSDL is currently specified in four parts: a Primer,11 a Core language specification,12 some predefined extensions,13 and specifications for binding WSDL14 to SOAP15 (SOAP, the Simple Object Access Protocol, is a W3C specification16 for “XML-based information which can be used for exchanging structured and typed information between peers in a decentralized, distributed environment”) and HTTP.17 In addition, the WSDL suite of documents include specifications of XML Schemas for WSDL itself, for the binding with SOAP, and for the HTTP binding.

As you may have inferred from the names of the various WSDL documents, not all of them deal with an XML vocabulary. Because WSDL is all about defining web services, we have to understand what it takes to define such a service. It’s beyond the scope of this book to cover all of the details, so we’ll just mention the highlights: message formats, data types, transport protocols, transport serialization formats, and perhaps some information about the message exchange pattern that is expected when the service is used.

Perhaps more importantly, web services involve a specification of the service or services that are offered, the resources involved in providing those services, the policies governing the use of the resources, and the messages through which a service communicates with its clients. All of these aspects are represented in XML using a specialized vocabulary.

WSDL provides vocabularies to specify the kinds of messages that a web service can send and receive, to describe the functionality that the service provides, to describe how to access the service, and to indicate where the service can be found on the web. For example, WSDL defines an XML element called <types> in which a web service definer specifies the various kinds of messages that the service consumes or emits. It also provides an <interface> element that is used to describe the services provided; each specific service is described by an <operation> child element.

WSDL is fairly complex, defining a large number of elements and attributes, and specifying the relationships between them. The complexity is, of course, due to the fact that it must be able to describe a very wide variety of services with all of the different aspects they present. Of course, the descriptions are pretty regular, and are in any case described by XML Schemas. Nonetheless, finding information about a web service described using WSDL requires some thought. As we’ve seen in earlier sections of this chapter, XPath and, by extension, XQuery can be used for that purpose, but neither language has any particular knowledge of what a WSDL document means. Surely, you might ask, it is desirable for applications to be able to use searching facilities that understand the meaning of WSDL descriptions. As we discuss below, that isn’t necessarily true.

UDDI, as its name implies, focuses on assisting applications in locating web services having particular characteristics – such as offering particular services, perhaps at a specific price or within a required time frame. UDDI provides “the definition of a set of services supporting the description and discovery of (1) businesses, organizations, and other Web services providers, (2) the Web services they make available, and (3) the technical interfaces which may be used to access those services.”

Sounds a lot like WSDL, doesn’t it? The difference between WSDL and UDDI, however, is that WSDL focuses on describing the services themselves, while UDDI focuses on publication of the services for automatic discovery. UDDI defines its own specialized vocabulary, oriented toward the creation, automatic maintenance, and automatic use of registries of services (that is, collections of descriptions of services, which might be represented in WSDL, as well as in other languages). UDDI registries capture business entities (the providers of web services), business services, templates for describing the information required to use a service, the relationship between one business entity and another (relative to the service), and requests by business entities to be kept informed of changes to the service.

UDDI does not provide a specific query language that consumers of web services use to locate services. It does, however, provide an API, the UDDI Inquiry API, that “provides the ability to issue precise searches based on the different classification schemes.” The Inquiry API accepts XML fragments whose elements and attributes describe the business, service, or access detail that is required. It is through those elements and attributes that the details of a query are specified.

XQuery, as you read in Chapter 12, has both a “human-readable” syntax and an XML syntax. The language used to query UDDI registries has only an XML syntax. We are not aware of any interest in providing a syntax more comfortable to human users. In our opinion, such a syntax would be of little use, as machine generation of an XML fragment to describe search criteria is at least as easy as generation of an expression in the kind of less regular language that humans use more readily.

But what about searching the WSDL-expressed information that a UDDI registry might contain? Since UDDI is not limited to a registry of web services that are described in WSDL, its inquiry APIs have been designed without specific WSDL syntax in mind. Instead, UDDI’s APIs express search conditions in an abstract form that the UDDI registries’ engines can adapt to the specific syntax and vocabularies of the web service description languages that they support.

(On a less positive note, there are in fact two “flavors” of WSDL and two competing UDDI registries – one from Microsoft and one from IBM.)

16.4 Customized Query Languages

As we’ve seen throughout this chapter, the specialized markup languages are not generally accompanied by an associated, specialized query language. XPath and XQuery can always be used to query documents expressed in such vocabularies, but those two languages have the weakness that they are unaware of the underlying meaning of the documents being searched.

As you will read in Chapter 18, “Finding Stuff,” the effort to apply meaning to the information available on the web involves specification of a model known as RDF (Resource Description Framework) that can be used to represent the semantics of data in ways that can be used automatically. RDF is not inherently represented in XML, but XML is certainly going to be one of the more popular ways of representing RDF. But, represented in XML or not, RDF has certain specific characteristics that both simplify and complicate the process of uncovering the information that it captures. A query language named SPARQL has been designed by the W3C specifically to query information represented as RDF.

So, there we have it. At least one important specialized (not necessarily XML) vocabulary has been given a specialized query language to accompany it. But, one should ask, why would XQuery (or XPath) not be adequate for querying RDF information? Surely, if MathML, SMIL, or SVG, or even WSDL do not deserve a special-purpose query language, then RDF cannot justify it either. We confess to having asked exactly these questions when we first learned of SPARQL.

But we subsequently realized that XPath and XQuery are purpose-built to query data represented in the XPath 2.0 and XQuery 2.0 Data Model (described in Chapter 10, “Introduction to XQuery 1.0,” and other chapters of this book) and can certainly be used to query data represented in an Infoset (see Chapter 6, “The XML Information Set (Infoset) and Beyond”), while RDF uses quite a different data model. It is, of course, possible to use XPath and XQuery to find information in RDF (when it’s represented in XML, at least), but SPARQL is designed explicitly for querying data represented in the RDF data model.

That doesn’t mean that the two data models, nor the two languages, are intended to be competitors. They have quite different sets of goals and requirements, and they serve quite different needs. At the time of writing, neither SPARQL nor XQuery 1.0 (nor XPath 2.0) had reached the final Recommendation stage of processing within the W3C. That leaves opportunity for interested parties to attempt to rationalize the situation and perhaps bring both the models and the languages into harmony with one another.

Much the same sort of debate could be had regarding the suitability of XQuery (and XPath) for querying documents defined using specialized XML vocabularies, such as RSS18 documents and specifications of ontologies.19 In our opinion, XQuery is usable for such purposes, but – as for other specialized vocabularies discussed in this chapter – not ideal for them. Within the W3C, we expect that SPARQL will be chosen as the most appropriate language for querying ontologies. We do not know of any effort to define a query language specifically oriented toward RSS, although we are aware of a language called RQL (RDF Query Language) that can be used to query RSS in its RDF form.

16.5 Chapter Summary

In this chapter, we’ve seen that XML is used for more than merely exchanging data and documents, or storing them. It is the foundation for a very large number of specialized XML vocabularies, some of which have very large communities of users and some of which are narrowly focused. We explored a small number of specialized XML vocabularies and considered whether they would benefit from having a custom query language that could be used by their user communities; we concluded that, in most cases, either there was no great motivation to query the marked-up data (at least not enough to support development, standardization, and implementation of specialized query languages), or the obvious solution was to query the data using XQuery (or an application based on XQuery).

Does this mean that we believe that XQuery and XPath are sufficient to address all of the world’s XML querying problems? For the most part, XQuery and XPath are sufficient (if not entirely ideal) for querying data represented in XML, whether in specialized vocabularies or not. But, of course, XQuery/XPath are not satisfactory for “all of the world’s querying problems,” since not all of the world’s data is (or should be) represented in XML. We (the collective “we”) have to consider a number of factors when considering special-purpose query languages. Among the factors are the size of the potential user community for a given XML vocabulary, the difficulties of using existing tools (e.g., XPath, XQuery, DOM), the added benefits of a purpose-built query language, the costs of trying to define and standardize that language, and the probability that there will be useful implementations of it. When all factors are taken into account, we think that it’s not surprising that a few specialized query languages, supplemented by a small number of powerful general purpose languages, do a sufficiently good job of meeting market requirements.

Again, XQuery is a good (maybe even the best possible?) base for querying anything that is represented as XML. Specialized XML languages may motivate specialized query languages, but we suspect that they would probably be based on XQuery. RDF is an exception – because RDF is not an XML language, even though it can be represented in an XML form.


1As you read in Chapter 1, “XML,” we use the term markup language to mean both the definition (normally done using an XML Schema) of the elements and attributes that are used to mark up information and the description of what each of those components mean. By contrast, we use the term markup vocabulary to mean only the element and attribute definitions, exclusive of explicit semantic definition of their meanings.

2Extensible Business Reporting Language (XBRL) 2.1, XBRL Recommendation (New York: XBRL International, 2005). Available at: http://www.xbrl.org/Specification/XBRL-RECOMMENDATION-2003-12-31+Corrected-Errata-2005-04-25.htm.

3XBRL Voluntary Reporting Program on the EDGAR System, http://www.sec.gov/rules/final/33-8529.htm.

4Mathematical Markup Language (MathML) Version 2.0, second edition, W3C Recommendation (Cambridge, MA: World Wide Web Consortium, 2003). Available at: http://www/w3.org/TR/MathML2/.

5Donald Knuth, The TeXbook (New York: Addison-Wesley Professional, 1984).

6Synchronized Multimedia Integration Language (SMIL 2.0), second edition, W3C Recommendation (Cambridge, MA: World Wide Web Consortium, 2005). Available at: http://www.w3.org/TR/SMIL/.

7Cascading Style Sheets Level 2, CSS2 Specification, W3C Recommendation (Cambridge, MA: World Wide Web Consortium, 1998). Available at: http://www.w3.org/TR/REC-CSS2/.

8Scalable Vector Graphics (SVG) 1.1 Specification, W3C Recommendation (Cambridge, MA: World Wide Web Consortium, 2003). Available at: http://www/w3.org/TR/SVG11/.

9Web Services Architecture, W3C Note (Cambridge, MA: World Wide Web Consortium, 2004). Available at: http://www.w3.org/TR/ws-arch/.

10UDDI Version 3.0.2, UDDI Spec Technical Committee Draft (OASIS Open, 2004). Available at: http://uddi.org/pubs/uddi_v3.htm.

11Web Services Description Language (WSDL) Version 2.0 Part 0: Primer, W3C Working Draft (Cambridge, MA: World Wide Web Consortium, 2004). Available at: http://www.w3.org/TR/wsdl20-primer.

12Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language, W3C Working Draft (Cambridge, MA: World Wide Web Consortium, 2004). Available at: http://www.w3.org/TR/wsdl20.

13Web Services Description Language (WSDL) Version 2.0 Part 2: Predefined Extensions, W3C Working Draft (Cambridge, MA: World Wide Web Consortium, 2004). Available at: http://www.w3.org/TR/wsdl20-extensions.

14Web Services Description Language (WSDL) Version 2.0 Part 3: Bindings, W3C Working Draft (Cambridge, MA: World Wide Web Consortium, 2004). Available at: http://www.w3.org/TR/wsdl20-binding.

15SOAP Version 1.2 Part 1: Messaging Framework, W3C Recommendation (Cambridge, MA: World Wide Web Consortium, 2003). Available at: http://www.w3.org/TR/soap12-part1/.

16SOAP Version 1.2 Primer, W3C Recommendation (Cambridge, MA: World Wide Web Consortium, 2003). Available at: http://www.w3.org/TR/soap12-part0/.

17RFC 2616, Hypertext Transfer Protocol – HTTP/1.1 (The Internet Society, 1999). Available at: http://www.ietf.org/rfc/rfc2616.txt.

18The name “RSS” has been variously claimed to be an acronym for “Rich Site Summary,” “RDF Site Summary,” and “Real Simple Syndication” (and probably other phrases as well). RSS provides a “publish/subscribe” mechanism for sharing content from one website with other websites. It is being increasingly used to publish news headlines, weblogs (“blobs”), and the like. RSS is normally specified in an XML format, initially (but no longer) as an RDF document.

19An ontology is “an explicit formal specification of how to represent the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them” (taken from http://dli.grainger.uiuc.edu/glossary.htm). In the W3C, ontologies are specified using OWL (see http://www.w3.org/2004/OWL/ for additional information about this RDF-based, and thus XML-based, vocabulary).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.28.107