3

Everybody Likes Jazz

In this chapter we will model a small knowledge base. This will provide us with the information structures from which we can draw examples for the following chapters. I have chosen an example about jazz music and jazz musicians, first, because I think this example is pretty cool, and second, because it allows us to apply our modeling techniques to an area that is not so well understood as the classical purchase order example.

As you probably know, the relationships among jazz musicians are manifold and complex. New bands and projects are set up all the time, and there are many forms of collaboration. In this respect, jazz music very much resembles electronic business, where business relations are much more short-lived than in the old economy. This chapter develops this example in detail, starting with an informal verbal description and then formalizing this description into a conceptual model.

3.1 INFORMAL DESCRIPTION

A popular method for modeling an information domain is to start with an informal, verbal description of the scenario. The scenario should be described in short, simple sentences.

image

The actual relationships are, as a matter of fact, much more complicated. For example, we could include a full taxonomy for musical instruments and styles. But for the purpose of this example, this description might do.

To prepare the construction of the conceptual model, we perform a simple grammatical analysis. In each sentence we identify the nouns (jazz musician, person, name, birth date, band, collaboration, location, album, etc.) and verbs (is, has, collaborate, plays, etc.). This will help us to identify relevant information items.

3.2 THE CONCEPTUAL MODEL, FIRST DRAFT

We are now going to transform this informal description into a more formal conceptual model. In traditional conceptual modeling (such as ERM or Object Role Modeling), nouns would end up as entities (or attributes) and verbs would end up as relationships. But we will use AOM instead, so both nouns and verbs will become assets. This simplification will spare us the classical design dilemma: Do I choose an entity or relationship to model an item that could be both? In our example, this is collaboration-collaborate. In one sentence it behaves like an entity; in another sentence it acts as a relationship. In AOM this dilemma does not exist, and the only choice is which name to choose for the asset. In such cases we usually decide for the noun form, which is collaboration.

When modeling verbs as assets, there are two notable exceptions:

image The verb has indicates either that an asset has a property, as in

    A person has a name and a birth date.

    or that an asset aggregates other assets, as in

    An album has one or several tracks.

In this case we represent the verb has with a simple arc leading from asset type album to asset type track. As explained in Section 2.5.1, it is the asset type album that could be regarded as a relationship between tracks.

image The expression is a indicates a classification:

    A jazz musician is a person.

The noun on the right-hand side (person) is usually the more general term than the noun on the left-hand side (jazz musician). Again, we represent the verb is with a simple arc leading from jazz musician to person. We indicate the special role of this connection by decorating the arc with is_a.

3.3 ASSET OR PROPERTY?

There is still a design decision to make. Especially for aggregations, we have to decide if we want to model the aggregated items as assets or as properties of the aggregating item. This distinction is not always easy. However, there are a few guidelines:

image Anything that plays a certain role in the context of our business is definitely an asset. So, the decision about what becomes an asset and what a property may depend on the business process. For example, in the context of our jazz knowledge base, it may be sufficient to model instrument as a property. But if we plan to implement a supply chain for a music shop, instrument would definitely be an asset, and an important one.

image In many cases the distinction between a property and an asset can be made using a simple rule: A property can belong to an asset, but an asset cannot belong to a property. For example, a duration cannot have a track.

image An item that is only connected to a single asset is always a candidate for becoming a property. In contrast, an item that has other connections, too, must be modeled as an asset. Take for example:

    An album has one or several tracks of a given duration.

    track could be modeled as a property of project if we did not have the following:

    An album may have one or several samples.

    and

    A sample provides an MP3 URL for a track.

    There is a cross-reference between sample and track that could not be modeled if sample and track were properties. sample puts track in relation to an MP3 URL. So we must model both track and sample as separate assets, but we can model the MP3 URL as a property of sample.

image Because AOM allows complex properties, we will find that complex information items that must be modeled in classical modeling methods as separate entities may be modeled as structured properties in AOM. This will result in a compact model.

3.3.1 The Jazz Model

Let’s discuss the model, shown in Figure 3.1, from left to right. First, a style has a name. This is not mentioned in the informal description. Informal descriptions usually make assumptions about the background knowledge of the reader, for example, the knowledge that most things and concepts do have a name. So, we have introduced a name, and we have declared it as a key, assuming that the name of a style is unique.

image

Figure 3.1 The Jazz model, Draft 1.

We have introduced two subproperties for the period, defining the start and the end of the period. Again, this stems from background knowledge. Periods do have a start and an end. At this point we do not determine how precise the start and end date should be. Here, for describing the period when a certain jazz style was dominant, it would be sufficient to specify both dates by year only. Alternatively, we could specify a period by giving a start date and the length of the interval. Notice that we have also introduced a description property. Although not required by the informal description, it may be useful to describe the style in a few words.

The asset belongsTo establishes a relationship between a style and a jazz musician. This is a many-to-many relationship because one jazz musician may belong to several styles during his or her life, and of course many jazz musicians belong to one style. This relationship is attributed again with a period, which may differ from the period defined in the style. This attribute defines a period during which a given jazz musician belongs to a given style.

The asset jazzMusician does not have its own properties but inherits everything from the asset person. jazzMusician itself is marked as an abstract asset, indicated by the solid label box. So there will be no jazzMusician instances. Instances will be instrumentalists, jazzComposers, or jazzSingers.

The assets instrumentalist, jazzComposer, and jazzSinger have a property instrument. In case of an instrumentalist, this property is constrained by the cardinality [1..*] (+) because an instrumentalist has to play at least one instrument. The others are free to play as many instruments as they like, or not to play any instrument at all.

The definition of person is straightforward. At this stage we have declared the name property as a key, which will cause us some trouble later. We have made property birthDate optional because of The birth date may not be known.

The asset influences relates jazz musicians to other jazz musicians. Again, this is a many-to-many relationship, as one musician can influence many others but also can be influenced by several others. For this asset, the definition of role names is mandatory to differentiate both arrows leading to jazzMusician.

The asset produces describes the relationship between jazz musicians and solo albums. A jazzMusician may produce several solo albums, so this is a one-to-many relationship.

The asset collaboration acts as a classification for the various concrete collaborations such as jamSession, project, and band. Because it does not have its own instances, it is marked as abstract. Collaboration relates at least two jazz musicians (otherwise it wouldn’t be a collaboration) to an unlimited set of albums.

The modeling of album is straightforward. To identify an album uniquely, we have chosen both properties publisher and productNo to form a composite key. Because composite keys must be named, we have given it the name albumKey.

We have chosen to model the verb reviewed as a noun and to attribute it with a publishing date of the review. The result is asset review. This asset relates a critic, a magazine, and an album or jazz musician to each other—a ternary relationship. The choice between album and jazz musician is modeled via a cluster.

The asset critic inherits everything from asset person but overrides the property name because it needs only one occurrence of first. The modeling of asset magazine is straightforward. The assets band and project are very similar. Both have a name and a period during which they exist. We assume that the name is unique, so we use it as a key.

The asset jamSession is different. A jam session is performed at a certain time and at a certain place. We have modeled this with property performedAt. Because we are not interested in a particular sequence of location and time, we have used the operator & here. We have declared performedAt as a key, which should be sufficiently unique.

3.4 NORMALIZATION

After we have obtained a first draft of our model, we should normalize it. Unlike relational technology, XML and object-oriented formats allow a physical data format that follows the structures of the actual business data very closely. There is no need to break complex information items into a multitude of “flat” tables. We will find that an XML document can represent a conceptual asset almost unmodified. This does not mean that no normalization is required. We must still make sure that our information model does not have redundancies, and that we end up with an implementation that not only consistently matches the real-world relationships between information items but is also easy to maintain. We make sure that

image Asset types are primitive; that is, their properties do not contain information structures that could be modeled as independent asset types. For example, the asset type album must not embed data from jazzMusician.

image Asset types are minimal; that is, they do not contain redundant properties, meaning none of their properties can be derived from other properties. For example, the asset type person must not contain a property age, as this can be derived from birthDate.

image Asset types must be complete; that is, other assets that may be present in the real-world scenario can always be derived from the asset types defined in the model. Our model is not complete. A jazz album typically lists the participating musicians and which instruments each musician played on this album. This requires that we introduce a sentence like

    A jazz musician plays one or several instruments on an album.

    into our informal description and model it appropriately (see Section 3.5.

image Asset types must not be redundant; that is, none of the defined asset types in the model can be derived from other asset types in the model. In our example, we have a redundant asset. A band is a kind of project—the main difference is that it exists over a longer period of time and probably produces more albums. On an informal level, there is a semantic difference between both, but structurally they are the same.

    We fix this by deleting the asset band. In order to allow instances of band, however, we decorate asset project with two display labels: band and project. The consequence is that in the schema, both are treated equally but instances can have either name.

image All asset types must have a unique meaning.

image Assets should have a key. Keys must be minimal; that is, they must consist of the smallest set of properties that can uniquely identify an instance. In our example, not every asset has a key. (For example, belongsTo, influences, review, critic, and magazine don’t have a key.) We should introduce suitable keys for these assets. jazzMusician, instrumentalist, jazzSinger, and jazzComposer do not need their own key, because they inherit one from person. If an asset type does not have suitable properties that can act as keys, we can easily equip them with some kind of a unique property (for example, by generating a UUID for each instance).

In particular, keys are required when an asset has outgoing arcs and we plan to implement the model in a relational environment. Here, in our XML environment, it is very likely that we will implement the triangle album, track, sample with relational techniques (such graphs cannot be reduced to tree structures). Therefore we equip asset track with a new property trackNo that we declare as a key.

3.5 PARTITIONED NORMAL FORM

While the steps mentioned before result in a pretty robust model, there is one more thing we can do. Assets ultimately result in XML elements or documents, and can thus be subject to transformations (for example, via an XSLT stylesheet). To make the keys robust against such transformations, we should make sure that each asset is in Partitioned Normal Form (PNF).

An asset type or property is in Partitioned Normal Form (PNF) if the atomic properties of an asset constitute a key of the asset and all non-atomic properties and subproperties are in Partitioned Normal Form themselves.

Or, in other words:

All complex structures in the model (assets and complex properties) must have atomic child nodes that can act as a key.

What is the PNF good for? If we plan to store assets in relational databases, PNF is essential. Relational technology requires us to fragment all complex structures into flat relational tables. Keys that span complex structures would be lost during such a transformation to First Normal Form (INF) (see Section 11.5. But also in an XML environment, keys constituted from atomic fields are a good idea. For cross-references, XML Schema allows multifield keys (see Section 5.3.17), but each field must be atomic. DTDs and Relax NG, however, are even more limited: They allow only a single atomic field as a key for cross-references.

In our example, the following assets are not in PNF:

image person, because the key name (first,middle?, last) is a composite. A solution would be to introduce a personal ID. Here, we opt to introduce an atomic ID composed from last name, middle name, and first name, such as MingusCharles.

image jamSession, because the key performedAt(time&location) is a composite. Here, we opt for a different solution. We resolve the property performedAt into two independent properties: time and location. These two properties are atomic and can thus constitute a multifield primary key that conforms to PNF. An implementation of this key with DTDs or Relax NG would, however, cause troubles because these schema languages do not support multifield keys.

    Because AOM requires us to name a composite key, we decorate this key with the name jsKey.

Figure 3.2 shows our conceptual Jazz model after we have applied the changes suggested by normalization. We have made the following changes:

image

Figure 3.2 The Jazz model, Draft 2.

image Removed the redundancy between assets band and project by deleting asset band and decorating asset project with the two display labels band and project.

image Introduced a new property ID into asset person and declared it as the primary key.

image Resolved property performedAt in asset jamSession into time and location. We declared the combination of these two properties as a primary key and named it jsKey.

image Introduced a new asset plays that relates albums and jazz musicians. It is attributed with an instrument property. At least one instrument must be specified. (For a jazz singer, that would be “vocals.”)

image Factored out the definition of complex property period into an abstract asset period. We use this asset as a type definition. We have also improved this definition by making the subproperty to optional. This allows us to model periods that have not ended yet.

3.6 RESOLVING is_a RELATIONSHIPS

In the next step we “flatten” the model by resolving some of the is_a relationships. We do this to prepare the model for implementation with different technologies. While object-oriented technologies are well suited to capture deep hierarchies of superclasses and subclasses (although this may sometimes result in less than well maintainable implementations), the implementation of such data structures with relational technology or with XML would be rather awkward. XML Schema does support inheritance relationships between data types (although only single inheritance), but it does not support inheritance between document nodes.

For the purpose of manual conversion of the Jazz model into XML schemata (see Chapter 8), it is a good idea to resolve the is_a relationships wherever this is possible. When using a modeling tool such as KLEEN [KLEEN2002], this step should not be necessary because the modeling tool should be able to resolve inheritance relations before generating code.

We have the following options:

image Explicitly copy the features of the parent into the child asset types, then remove the parent asset. For example, we could copy the arcs and properties of asset type collaboration into the asset types jamSession and project. This would also allow us to sharpen the cardinality constraints for jamSession: A jam session produces at most one album.

    For asset jazzMusician, this operation would be far from simple, despite the fact that there are no properties to inherit. We would also need to copy the incoming and outgoing arcs. For the incoming arcs (from influence, collaboration, produces, plays, belongsTo), we would need to introduce clusters at the origin point of each arc.

image Fold the child assets into the parent asset. This is possible when the children don’t differ very much from each other. The result is a very compact model.

    Take for example instrumentalist, jazzSinger, and jazzComposer. These assets only differ in the cardinality of instrument. If we can tolerate losing that differentiation (we could later remedy this loss by introducing an explicit constraint), we move instrument into the parent asset jazzMusician, then remove the children. The cardinality of instrument is set to “*” (obtained by union of the individual cardinalities).

What remains is to introduce a feature that indicates the type of the child instance. Here we have two options:

image Create a property that specifies the instance type. For example, we can indicate instrumentalist, jazzSinger, and jazzComposer by an extra property named kind. We can declare the property as an enumeration type with the values instrumentalist, jazzSinger, and jazzComposer. Note that with this approach, the asset instances are no longer named instrumentalist, jazzSinger, and jazzComposer, but jazzMusician.

    To remedy the cardinality problem, we can introduce a constraint saying that kind must either be different from “instrumentalist” or there must be at least one instrument child.

image

image Indicate the child type by display labels in the parent type. For example, we could add three display labels instrumentalist, jazzSinger, and jazzComposer to jazzMusician. Instances of jazzMusician would then be instrumentalist, jazzSinger, or jazzComposer instances.

    To remedy the cardinality problem, we can introduce a constraint:

image

For asset jazzMusician, the second option (using multiple display labels) would be the most elegant option. But for tutorial purposes the additional property kind is created.

After applying these operations, our model would look like Figure 3.3 (page 82), which reflects the following changes:

image

Figure 3.3 The Jazz model, Draft 3.

image We have combined the assets instrumentalist, jazzComposer, and jazzSinger into a generic asset jazzMusician. In this asset we have introduced a new property, kind. The (yet undefined) type of this new property is restricted by the enumeration instrumentalist, jazzComposer, jazzSinger. To capture the restriction for instrumentalists, we have defined an explicit constraint.

image We have resolved the abstract asset collaboration into the concrete assets jamSession and project. These assets only inherited arcs from collaboration; there were no properties to inherit.

image We did not resolve abstract asset person into the concrete assets jazzMusician and critic. This would introduce too many redundant definitions into the model. We also want to keep at least one abstract asset, to see how we can deal with it later, during implementation.

3.7 INTRODUCING LEVEL 2 STRUCTURES

In our model we use Level 2 Structures (L2S) to model business objects. Business objects are assets that play a prominent role in our scenario. Identifying a business object requires that we have an idea not only about the structure of the information, but also about the purpose of that information.

In our example, all jazzMusician asset types, style, all collaboration asset types, album, and review could be L2S. Jazz musicians are clearly the most important topic in our knowledge base, but similarly important are style and the various collaborations. album could play a role if we plan to connect our knowledge base with an online shop for CDs. The asset magazine does not play a prominent role in our scenario; therefore, we include it in the L2S review.

After determining all the dominant assets in our model, we group the remaining assets around these selected assets, demarcate these groups with a Level 2 box, and arrive at the diagram shown in Figure 3.4.

image

Figure 3.4 The Jazz model, Draft 4.

Remember the constraint that must be enforced when constructing L2S from assets:

Starting from the identifying asset of an L2S, we must be able to reach any asset belonging to that L2S by following the arcs in the indicated direction.

This constraint will allow us to interpret each L2S as an aggregation and later make it easy to implement the L2S in the form of hierarchical data models such as XML documents.

When we check this constraint for our model, we encounter three problems:

image From the assets belongsTo and influences, both arrows lead to asset jazzMusician. This is bad, because when starting at jazzMusician, we cannot reach belongsTo and influences.

image Asset produces cannot be reached from jazzMusician.

image Asset plays cannot be reached from album.

To solve these problems, we simply reverse one of the arcs for each of the assets belongsTo and influences. This results in a slightly different interpretation. We are now saying:

A jazzMusician has a “belonging” to a style.andA jazzMusician has influences from other jazz musicians.

In the case of influences, the decision as to which of the two arrows to reverse depends on which jazz musician should be assigned influence assets: the one who is influenced, or the one who influences others. It is better to take the first option: Jazz musicians might tell you who influenced them, but they are not likely to tell you who they influenced.

We also reverse the arcs leading from album to plays and from jazzMusician to produces. We decorate each reversed arc with an asterisk to remove any cardinality constraint and remove its role name.

We also take the opportunity to fix a problem with keys. The asset review definitely needs a key, because it is an identifying asset of L2S review. The identifying asset of an L2S should indeed always have a key, because otherwise, instances of such an L2S could become inaccessible when stored in a database. We therefore have introduced a property ID for asset review, which could be a generated identifier such as a UUID or a URL for a web page. Figure 3.5 shows the results.

image

Figure 3.5 The Jazz model, Draft 5.

There is still one problem with asset influence: Jazz musicians hardly influence themselves, but this is exactly what we have specified. Remember that arcs that are local to an L2S are also local to instances of the L2S. To allow a jazz musician to be influenced by other jazz musicians, we must loosen the range constraint of the arc influencedBy. We do this by decorating this arc with >jazzMusician (the display name of the L2S).

Structurally, our conceptual model is now complete. We finish the definition of the model by rendering a few more details, such as global model settings (default namespace, default type system, and default constraint language), and by decorating atomic properties with data types from XML Schema (see Section 5.2. There is one exception: Later, we want the property description to contain complex XHTML content, but at the moment we do not want to specify this further. We therefore extract this property as an empty asset for later detailing. Figure 3.6 (page 86) shows the results.

image

Figure 3.6 The Jazz model, final draft.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.80.34