Chapter 20. XML information modeling

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 20. XML information modeling

This chapter addresses some of the general modeling and design questions that come up when designing XML documents, and to a lesser extent the schemas that describe them. For developers who are accustomed to defining data as entity-relationship models, relational tables, UML, and object-oriented models and classes, there is a learning curve associated with the hierarchical model of XML.

This chapter will help you up that curve. It first compares XML modeling and design to other disciplines, such as relational models and object-oriented models, and shows how XML Schema features can be used to describe these models. It then provides some general design principles for modeling web services, dealing with document-oriented narrative content, and working with a hierarchical information model.

20.1. Data modeling paradigms

If you are approaching the subject of XML Schema with some previous background in data design, you may be wondering how to represent in XML concepts from

• Relational models, such as entity-relationship data models or relational database design

• Object-oriented models, which may exist for example as UML class diagrams and/or object-oriented program code

You may continue to use these modeling paradigms along with your XML application. For example, you may be parsing XML and storing it in a relational database (this is sometimes known as “shredding”), in which case you still have a relational model for your data. You may be processing your XML documents with object-oriented code, so there still needs to be a correspondence between the XML and the object model.

Some schema designers choose to maintain these models, such as UML models, entity-relationship diagrams, and/or supplementary documentation, alongside the XML Schema. Others rely more heavily on the XML Schema to represent the entire model. This is convenient in that there is a one-to-one mapping to the actual XML documents that are in use. However, it does have some drawbacks in that XML Schema cannot express every constraint on the data and is somewhat technology-specific.

Some developers maintain a connection between the models using toolkits that generate program code or even databases. It is particularly common to use data binding toolkits to generate object-oriented classes from schemas. As appropriate, this chapter describes some of the considerations for designing XML documents to optimize the use of these toolkits.

20.2. Relational models

Designing an XML message structure is different in some ways from traditional entity-relationship modeling and relational database design, where the data model is a persistent-storage representation of the data. When creating an entity-relationship model, great care is (hopefully) taken to define what an entity is, as opposed to how it is used in any particular context. For example, when you model a “customer” entity, you decide on your definition of a customer, its unique identifier, and all of its attributes. You also normalize all the relationships between customers and other entities: For example, a customer can have one or more addresses, and can be associated with zero or more purchases.

An XML message, on the other hand, often represents a particular usage or view of the data, useful at a particular time in a certain operation. Instead of being the definitive source for all information about that entity, it contains only the subset that is useful for the operation in question. For a purchase order, you probably do not need to include all of the information that can be known about a customer; perhaps you just need an identifier, name, and shipping address. For a line item in the purchase order, you may need to know a product’s identifier, name, and price, but not its other attributes such as a long description or a list of features.

Relationships also differ in the two models. In an entity-relationship or relational model, there is no single starting point to the model; entities exist and can be accessed independently of each other. In an XML hierarchy, one element must be at the root of the structure, and there is an implied relationship between all of the elements within that hierarchy. Again, only the relationships that are relevant to the particular message are included, and their cardinality may differ in the message as compared to the relational data model. Representing relationships in XML is discussed later in this chapter.

In an ideal scenario, you will have a standardized canonical model that you will draw on for your XML message schemas. Just as in relational database design, in XML message design it makes sense to use the same element names, types, and relationships for the same data where possible. For example, if your corporate data model says that an Address entity has the properties line1, line2, city, state, and zip, it makes sense to use the same definitions and names (or the relevant subset of them) for the elements in your XML messages.

On the other hand, it is best to avoid tightly coupling your XML messages with any one relational database schema. You might use the same names and definitions if they are well-designed, but should not, for example, generate your XML schemas from relational databases or have your application automatically insert the contents of XML elements into relational columns of the same name. This would create too close a relationship between the XML message and the database, where the message schema would have to change if the database changes.

20.2.1. Entities and attributes

In a relational model, you will typically have entities, each with a set of attributes or properties. In a relational database, these would be implemented as tables and columns, with each instance represented as a row with multiple cells. In XML, this roughly translates into elements with complex content and elements with simple content. For the entity-relationship model shown in Figure 20–1, our first cut at representing that in XML (leaving aside the relationships for now) might be as shown in Example 20–1.

Figure 20–1. Entity-relationship diagram

Example 20–1. A simple representation of relational entities in XML

Table of Contents for Chapter 20. XML information modeling

Create new playlist

Sign In

Sign Up

Chapter 20. XML information modeling

20.1. Data modeling paradigms

20.2. Relational models

20.2.1. Entities and attributes

20.2.2. Relationships

20.2.2.1. One-to-one and one-to-many relationships

20.2.2.2. Many-to-many relationships

20.2.2.2.1. Approach #1: Use containment with repetition

20.2.2.2.2. Approach #2: Use containment with references

20.2.2.2.3. Approach #3: Use relationship elements

20.3. Modeling object-oriented concepts

20.3.1. Inheritance

20.3.2. Composition

20.4. Modeling web services

20.5. Considerations for narrative content

20.5.1. Semantics vs. style

20.5.1.1. Benefits of excluding styling

20.5.1.2. Rendition elements: “block” and “inline”

20.5.2. Considerations for schema design

20.5.2.1. Flexibility

20.5.2.2. Reusing existing vocabularies

20.5.2.3. Attributes are for metadata

20.5.2.4. Humans write the documents

20.6. Considerations for a hierarchical model

20.6.1. Intermediate elements

20.6.2. Wrapper lists

20.6.3. Level of granularity

20.6.4. Generic vs. specific elements

Table of Contents for
Chapter 20. XML information modeling