Modeling Approaches

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

1.2. Modeling Approaches

When we design a database for a particular business domain, we create a model of it. Technically, the business domain being modeled is called the universe of discourse (UoD), since it is the universe (or world) that we are interested in discoursing (or talking) about. The UoD or business domain is typically “part” of the “real world”. To build a good model requires a good understanding of the world we are modeling, and hence is a task ideally suited to people rather than machines. The main challenge is to describe the UoD clearly and precisely. Great care is required here, since errors introduced here filter through to later stages in software development. The later the errors are detected, the more expensive they are to remove.

A person who models the UoD is called a modeler. If we are familiar with the business domain, we may do the modeling ourselves. If not, we should consult with others who, at least collectively, understand the business domain. These people are called domain experts or subject matter experts. Modeling is a collaborative activity between the modeler and the domain expert.

Since people naturally communicate (to themselves or others) with words, pictures, and examples, the best way to arrive at a clear description of the UoD is to use natural language, intuitive diagrams, and examples. To simplify the modeling task, we examine the information in the smallest units possible: one fact at a time.

The model should first be expressed at the conceptual level, in concepts that people find easy to work with. Figure 1.1 depicted a model in terms of relational database structures. This is too far removed from natural language to be called conceptual. Instead, relational database structures are at the level of a logical data model. Other logical data models exist (e.g., network, XML schema, and object-oriented approaches), and each DBMS is aligned with at least one of these. However, in specifying a draft conceptual design, the modeler should be free of implementation concerns. It is a hard enough job already to develop an accurate model of the UoD without having to worry at the same time about how to translate the model into data structures specific to a chosen DBMS.

Implementation concerns are of course important, but should be ignored in the early stages of modeling. Once an initial conceptual design is created, it can be mapped down to a logical design in any data model we like. This flexibility also makes it easier to implement and maintain the same application on more than one kind of DBMS.

Although most applications involve processes as well as data, we’ll focus on the data, because this perspective is more stable, and processes depend on the underlying data. Three information modeling approaches are discussed: Entity-Relationship modeling (ER), fact-oriented modeling, and object-oriented modeling.

Any modeling method comprises a notation as well as a procedure for using the notation to construct models. To seed the data model in a scientific way, we need examples of the kinds of data that the system is expected to manage. We call these examples data use cases, since they are cases of data being used by the system. They can be output reports, input screens, or forms and can present information in many ways (tables, forms, graphs, etc.). Such examples may already exist as manual or computer records. Sometimes the application is brand new, or an improved solution or adaptation is required. If needed, the modeler constructs new examples by discussing the application area with the domain expert.

As an example, suppose our information system has to output room schedules like that shown in Table 1.3. Let’s look at some different approaches to modeling this. It is not important that you understand details of the different approaches at this stage. The concepts are fully explained in later chapters.

Table 1.3. A simple data use case for room scheduling.
Room	Time	Activity Code	Activity Name
20	Mon 9 a.m.	ORC	ORM class
20	Tue 2 p.m.	ORC	ORM class
33	Mon 9 a.m.	XQC	XQuery class
33	Fri 5 p.m.	STP	Staff party
...	...	...	...

Entity-Relationship modeling was introduced by Peter Chen in 1976 and is still the most widely used approach for data modeling. It pictures the world in terms of entities that have attributes and participate in relationships. Over time, many versions of ER arose. There is no single, standard ER notation.

Different versions of ER may support different concepts and may use different symbols for the same concept. Figure 1.2 uses a popular ER notation long supported by CASE tools from Oracle Corporation. Here, entity types are shown as named, soft rectangles (rounded corners). Attributes are listed below the entity type names. An octothorpe “#” indicates the attribute is a component of the primary identifier for the entity type, and an asterisk “*” means the attribute is mandatory. Here, an ellipsis “...” indicates other attributes exist but their display is suppressed.

Figure 1.2. An ER diagram for room scheduling.

Relationships are depicted as named lines connecting entity types. Only binary relationships are allowed, and each half of the relationship is shown either as a solid line (mandatory) or as a broken line (optional). For example, each RoomHourSlot must have a Room, but it is optional whether a Room is involved in a RoomHourSlot. A bar across one end of a relationship indicates that the relationship is a component of the primary identifier for the entity type at that end. For example, RoomHourSlot is identified by combining its hour and room. Room is identified by its room number, and Activity by its activity code.

A fork or “crow’s foot” at one end of a relationship indicates that many instances of the entity type at that end may be associated (via that relationship) with the same entity instance at the other end of the relationship. The lack of a crow’s foot indicates that at most one entity instance at that end is associated with any given entity instance at the other end. For example, an Activity may be allocated many RoomHourSlots, but each RoomHourSlot is booked for at most one Activity.

To its credit, this ER diagram portrays the domain in a way that is independent of the target software platform. For example, classifying a relationship end as mandatory is a conceptual issue. There is no attempt to specify here how this constraint is implemented (e.g., using mandatory columns, foreign key references, or object references). However, the ER diagram is incomplete (can you spot any missing constraints?). Moreover, the move from the data use case to the model is not obvious. While an experienced ER modeler might immediately see that an entity type is required to model RoomHourSlot, this step might be challenging to a novice modeler.

Let’s see if fact-oriented modeling can provide some help. Our treatment of fact-orientation focuses on Object-Role Modeling. ORM began in the early 1970s as a semantic modeling approach that views the world simply in terms of objects (things) playing roles (parts in relationships). For example, you are now playing the role of reading this book, and the book is playing the role of being read. ORM has appeared in a variety of forms such as Natural-language Information Analysis Method (NIAM). The version discussed in this book is based on extensions to NIAM and is supported by industrial software tools.

Regardless of how data use cases appear, a domain expert familiar with their meaning should be able to verbalize their information content in natural language sentences. It is the modeler’s responsibility to transform that informal verbalization into a formal yet natural verbalization that is clearly understood by the domain expert. These two verbalizations, one by the domain expert transformed into one by the modeler, comprise steps la and lb of ORM’s conceptual analysis procedure. Here we verbalize sample data as fact instances that are then abstracted to fact types. Constraints and perhaps derivation rules are then added, and themselves validated by verbalization and sample fact populations.

To get a feeling of how this works in ORM, suppose that our system is required to output reports like Table 1.3. We ask the domain expert to read off the information contained in the table, and then we rephrase this in formal English. For example, the subject matter expert might express the facts on the top row of the table as follows: Room 20 at 9 a.m. Monday is booked for the activity ‘ORC which has the name ‘ORM class’.

As modelers, we rephrase this into two elementary sentences, identifying each object by a definite description: the Room numbered ‘20’ at the HourSlot with day-hour-code ‘Mon 9 a.m.’ is booked for the Activity coded ‘ORC; the Activity coded ‘ORC has the ActivityName ‘ORM class’. Once the domain expert agrees with this verbalization, we abstract from the fact instances to the fact types (i.e., the types or kinds of fact). We might then depict this structure on an ORM diagram and populate it with sample data and counter data (explained shortly) as shown in Figure 1.3.

Figure 1.3. An ORM diagram for room scheduling, with sample and counter data.

By default, entity types are shown in ORM as named, soft rectangles (rounded corners) and must have a reference scheme, i.e., a way for humans to refer to instances of that type. Simple reference schemes may be shown in parentheses (e.g., “(.nr)”), as an abbreviation of the relevant association, e.g., Room has RoomNr. Value types such as types of character strings need no reference scheme and are shown as named, dashed, soft rectangles (e.g., ActivityName).

This book uses the notation of ORM 2 (second generation ORM), as supported by the NORMA (Neumont ORM Architect) tool, an open source plug-in to Microsoft Visual Studio .NET. The previous version of ORM, as supported by Microsoft Visio for Enterprise Architects, depicts object types as ellipses, not soft rectangles. As a configuration option, NORMA allows object types to be displayed as ellipses or hard rectangles.

Unless indicated otherwise, in this book the term “ORM” is understood to mean ORM 2. When specific reference is made to the previous version of ORM, the term “ORM 1” is used. The ORM glossary at the end of this book includes a side-by-side comparison of ORM 1 and ORM 2 notations.

In ORM, a role is a part played in a fact type (relationship or association). A relationship is shown as a named sequence of one or more role boxes, each connected to the object type whose instances play that role. Figure 1.3 includes a ternary (three-role) association, Room at HourSlot is booked for Activity, and a binary (two-role) association Activity has ActivityName.

Unlike ER, ORM makes no use of attributes in its base models. All facts are represented in terms of objects (entities or values) playing roles. Although this often leads to larger diagrams, an attribute-free approach has advantages for conceptual analysis, including simplicity, stability, and ease of validation. If you are used to modeling in ER or the Unified Modeling Language (UML) (see later), this approach may seem strange at first, but please keep an open mind about it.

ORM allows relationships of any arity (number of roles). Each fact type has at least one predicate reading, corresponding to one way of traversing its roles. Any number of readings may be provided for each role ordering. For a binary association, forward and inverse predicate readings may be shown separated by a slash “/”. As in logic, a predicate is a sentence with object holes in it.

Mixfix notation enables the object terms to be mixed in with the predicate reading at various positions (as required in languages such as Japanese). An object placeholder is indicated by an ellipsis “...” (e.g., the ternary predicate “... at... is booked for ...”). For unary postfix predicates (e.g., “... smokes”) or binary infix predicates (e.g., “... has ...”) the ellipses may be omitted.

For each fact type, a fact table may be added with a sample population to help validate the constraints. Each column in a fact table is associated with one role. The lines beside the role boxes depict internal uniqueness constraints, indicating which roles or role combinations must have unique entries.

ORM schemas may be represented in diagrammatic or textual form, and some ORM tools can automatically transform between the two representations. Models are validated with domain experts in two main ways: verbalization and population.

For example, the uniqueness constraints on the ternary association verbalize as: For each Room and HourSlot, that Room at that HourSlot is booked for at most one Activity; For each HourSlot and Activity, at most one Room at that HourSlot is booked for that Activity.

The ternary fact table shows a satisfying population (each Room-HourSlot combination is unique, and each HourSlot-Activity combination is unique). The uniqueness constraints on the binary verbalize as: Each Activity has at most one ActivityName; Each ActivityName refers to at most one Activity. The 1:1 nature of this association is illustrated by the population, where each column entry occurs only once in its column.

The solid dot on Activity is a mandatory role constraint, indicating that each instance in the population of Activity must play that role. This verbalizes as Each Activity has some ActivityName. A role that is not mandatory is optional. Since sample data are not always significant, additional data (such as STM in the binary fact type) may be needed to illustrate some rules. The optionality of the other role played by Activity is shown by the absence of STM in its population.

Since ORM schemas can be specified in unambiguous sentences backed up by illustrative examples, it is not necessary for domain experts to understand the diagram notation at all. Modelers, however, find diagrams very useful for thinking about the universe of discourse.

To double check a constraint, a counterexample to that constraint may be presented. The counterrows appended to the fact tables test the uniqueness constraints. For instance, the first row and counterrow of the ternary indicate that room 20 at 9 a.m. Monday is booked for both the ORC and XQC activities. This challenges the constraint “For each Room and HourSlot, that Room at that HourSlot is booked for at most one Activity”. This constraint may be recast in negative form as: It is impossible that the same Room at the same HourSlot is booked for more than one Activity. The counterexample provides a test case to see if this situation is actually possible.

Concrete examples help domain experts to decide whether something really is a rule. This additional validation step is very useful in cases where the domain expert’s command of language suffers from imprecise or even incorrect use of logical terms (e.g., “each”, “at least”, “at most”, “exactly”, “the same”, “more than”, “if).

To challenge the constraint that at most one room at the same time is booked for the same activity, the first row and second counterrow of the ternary fact table in Figure 1.3 indicate that both room 20 and room 33 are used at 9 a.m. Monday for the ORC activity. Is this kind of thing possible? If it is (and for some application domains it would be) then this constraint is not a rule, in which case the constraint should be dropped and the counterrow added to the sample data.

However, if our business does not allow two rooms to be used at the same time for the same activity, then the constraint is validated and the counterexample is rejected (although it can be retained as an illustrative counterexample).

Compare Figure 1.2 with Figure 1.3. ER is often better than ORM for displaying compact overviews. However, ER models are further removed from natural language and may be harder for the domain expert to conceptualize. In this case, it was more natural to verbalize the first schedule fact as a ternary, but all popular ER notations with industrial support are restricted to binary (two-role) relationships.

Being only binary does not make a language less expressive, since an n-ary association (n > 2) may always be transformed into binaries by co-referencing or nesting (see later). However, such a transformation may introduce an object type that appears artificial to the domain expert, which can hinder communication. Wherever possible, we should try to formulate the model in a way that appears natural to the domain expert.

ER notation is less expressive than ORM for capturing constraints or business rules. For example, the ER notation used for Figure 1.2 was unable to express the constraint that activity names are unique or the constraint that it is impossible that more than one room at the same hour slot is booked for the same activity.

ER encourages decisions about relative importance at the conceptual analysis stage. Sometimes this may be seen as an advantage. For example, it is fairly natural to think of activity names as attributes of activities, and hence treat names as less important than activities themselves.

Sometimes, however, early distinctions on relative importance can be disadvantageous. For example, instead of using RoomHourSlot in Figure 1.2, we could model the room schedule information using ActivityHourSlot. Which of these choices is preferable may depend on what other kind of information we might want to record. However, because we have been forced to make a decision about this without knowing what other facts need to be recorded, we may need to change this part of the model later.

In general, if you model a feature as an attribute and find out later that you need to record something about it, you are typically forced to remodel it as an entity type or relationship because attributes can’t have attributes or participate in relationships.

For instance, suppose we record phone as an attribute of Room and then later discover that we want to know which phones support voice mail. Since you rarely know what all the future information requirements will be, an attribute-based model is inherently unstable. Moreover, applications using the model often need to be recoded when a model feature is changed. Since ORM is essentially immune to changes like this, it offers far greater semantic stability.

We have already seen that ORM models facilitate validation by both verbalization and population. Attributes make it awkward to use sample data populations. Moreover, populating optional attributes introduces null values, which may be a source of confusion to nontechnical people.

In light of the aforementioned considerations, it appears that ORM’s fact-oriented approach offers at least some advantages over ER modeling for conceptual analysis. This doesn’t mean that you should discard ER, since it has advantages too (e.g., compact diagrams). You can have your cake and eat it too by using ORM for the initial conceptual analysis and automatically generating an ER view from it when desired.

Even if you decide to use ER throughout, ignoring the ORM notation completely, you should find that applying or adapting the modeling steps in ORM’s conceptual schema design procedure to the ER notation will help you design better ER models.

Now let’s consider Object-Oriented (OO) modeling, an approach that encapsulates both data and behavior within objects. Although used mainly for designing object-oriented program code, it can also be used for database design. Many object-oriented approaches exist, but by far the most influential is the Unified Modeling Language, which has been adopted by the Object Management Group (OMG).

Among its many diagram types, UML includes class diagrams to specify static data structures. Class diagrams may be used to specify operations as well as low level design decisions specific to object-oriented code (e.g., attribute visibility and association navigability). When stripped of such implementation detail, UML class diagrams may be regarded as an extended version of ER.

A UML class diagram for our example is shown in Figure 1.4. To overcome some of the problems mentioned for the ER solution, a ternary association is used for the schedule information. Because of its object-oriented focus, UML does not require conceptual identification schemes for its classes. Instead, entity instances are assumed to be identified by internal object identifiers (oids).

Figure 1.4. A UML class diagram for room scheduling.

UML has no standard notation to signify that attribute values must be unique for their class. However, UML does allow user-defined constraints to be added in braces or notes in any language. We’ve added {P} to denote primary uniqueness and {Ul} for an alternate uniqueness—these symbols are not standard and hence not portable. The uniqueness constraints on the ternary are captured by the 0..1 (at most one) multiplicity constraints. Here “*” is shorthand for “0..*”, meaning “0 or more”. Attributes are mandatory by default.

How well does this UML model support validation with the domain expert? Let’s start with verbalization. Although often less than ideal, implicit use of “has” could be used to form binary sentences from the attributes, but what about the ternary? About the best we can do is something like “Booking involves Room and HourSlot and Activity”—which is pretty useless. What if we replaced the association name with a mixfix predicate, as we did in ORM, e.g., “... at... is booked for ...”?

This is no use, because UML association roles (or association ends as they are now called) are not ordered. So formally we can’t know if we should read the sentence type as “Room at HourSlot is booked for Activity”, or “Activity at HourSlot is booked for Room” etc. This gets worse if the same class plays more than one role in the association (e.g., Person introduced Person to Person).

UML requires association roles to have names (ORM allows role names, but does not require them), but role names don’t form sentences, which are always ordered in natural language. UML’s weakness with regard to verbalization of facts carries over into its verbalization of constraints and derivation rules.

The UML specification recommends the Object Constraint Language (OCL) for formal expression of such rules, but OCL is simply too mathematical in nature to be used for validation by nontechnical domain experts. In principle, a higher level language could be designed for UML that could be automatically transformed to OCL.

Since verbalization in UML has inadequate support, let’s try validation with sample populations. Not much luck here either. To begin with, attribute-based notations are almost useless for multiple instantiation and they introduce nulls into base populations, along with all their confusing properties.

UML does provide object diagrams that enable you to talk about attributed single instances of classes, but that doesn’t help with multiple instantiation. For example, the 1:1 nature of the association between activity codes and names is transparent in the ORM fact table in Figure 1.3, but is harder to see by scanning several activity objects.

In principle, we could introduce fact tables to instantiate binary associations in UML, but this wouldn’t work for non-binary associations. Why not? InUML you can’t specify a reading direction for an association unless it’s a binary. So there is no obvious connection between an association role and a fact column as there is in ORM.

The best we can do is to name each role and then use role names as headers to the fact table. However, the visual connection of the fact columns to the class diagram would be weak because of the nonlinear layout of the association roles, and the higher the arity of the association, the worse it gets.

In its favor, UML is far richer than ORM or ER in its ability to capture other aspects of application design (e.g., operations, activities, component packaging, and deployment). UML includes diagramming techniques, such as state machine and activity diagrams, to capture business processes. Any full specification of a business domain needs to address these dynamic aspects. If the application is to be implemented in object-oriented code, UML enables more precise descriptions of the programming code structures to be specified (e.g., attribute visibility and association navigability).

If we restrict our attention to conceptual data modeling, however, the ORM notation is significantly richer than ER or UML in its capacity to express business constraints on the data, as well as being far more orthogonal and less impacted by change. As a simple example, consider the output report of Table 1.4. You might like to try modeling this yourself before reading on.

Table 1.4. Another sample output report about Movies.
Movie		Director		Reviewers
Nr	Title	Name	Born	Name	Born
1	The DaVinci Code	Ron Howard	US	Fred Bloggs Ann Green	US US
2	Crocodile Dundee	Peter Faiman	AU	Ann Green Ima Viewer Tom Sawme	US GB AU
3	Star Peace	Ann Green	US	?	?
...	...	...	...	...	...

One way to model this report in UML is shown in Figure 1.5. Although the population of the sample report suggests that movie titles are unique and that a person can direct only one movie, let’s assume that the domain expert confirms that this is not the case. We should adapt our sample population to illustrate this (e.g., add a new movie 4 with the same title ‘Star Peace’ directed by Ron Howard).

Figure 1.5. A UML class diagram for Table 1.4.

Assuming people are identified simply by their name, Movie and Person classes may be used as shown. The role names “director” and “reviewer” are used here to distinguish the two roles played by Person. Similarly, role names are provided to distinguish the roles played by Movie. In this example, all four role names are required. Association names may be used as well if desired.

Unlike Chen’s original ER notation, UML binary associations are typically depicted by lines without a diamond. While this is convenient, the use of diamonds in longer associations is somewhat inconsistent, and the avoidance of unary relationships is unnatural. In principle, UML does allow diamonds as an alternative notation for binary associations, but in practice this is rarely seen.

In contrast, ORM’s depiction of relationships as a sequence of one or more roles, where each role is associated with a fact table column, provides a uniform, general notation that facilitates validation by both verbalization and sample populations.

The multiplicity constraints indicate that each movie has exactly one director but may have many reviewers and that a person may direct or review many movies. But there is still a missing business rule. Can you spot it?

Figure 1.6 models the same domain in ORM. Here the “◂’ before “has” reverses the normal left-to-right reading direction. The rule missing from the UML model is captured graphically by the circled “X” constraint between the role-pairs comprising the “directed” and “reviewed” associations. This is called an exclusion constraint.

Figure 1.6. ORM model for Table 1.4, with counterexample for exclusion constraint.

This exclusion constraint verbalizes as No Person directed and reviewed the same Movie or, reading it the other way, No Movie was directed by and was reviewed by the same Person. To validate this rule with the domain expert, you should verbalize the rule and also provide a counterexample. For example, in your model is it possible for Movie 1 to be directed by Ron Howard and also reviewed by Ron Howard? Figure 1.6 includes this counterexample. If the exclusion constraint really does apply, at least one of those two facts must be wrong.

Some domain experts are happy to work with diagrams and some are not. Some are good at understanding rules in natural language and some are not. But all domain experts are good at working with concrete examples. Although it is not necessary for the domain expert to see the diagram, being able to instantiate any role directly on the diagram makes it easy for you as a modeler to think clearly about the rules.

Although UML has no graphic notation for general exclusion constraints, it does allow you to document constraints in a note attached to the relevant model elements. If a concept is already part of your modeling language, it’s easier to think of it.

Since the exclusion constraint notation is not built in to the UML language, it is easy to miss the constraint in developing the model. The same thing goes for ER. In contrast, the ORM modeling procedure prompts you to consider such a constraint and allows you to visualize and capture the rule formally. An ORM tool can then map the constraint automatically into executable code to ensure that the rule is enforced in the implementation.

ORM diagrams always display semantic domains as object types. The ORM diagram in Figure 1.7(a) includes role names for birthdate and deathdate, shown in square brackets next to the relevant roles. These roles are clearly compatible, as they are both played by the object type Date. In ORM, role names may be used like attribute names in automatically generated attribute-views, as well as in rules specified in attribute style (e.g., deathdate > birthdate).

Figure 1.7. ORM object types are semantic domains.

ER diagrams typically hide attribute domains. For example, the birthdate and deathdate attributes in the Barker ER model shown in Figure 1.7(b) should be based on the domain Date, but this is not represented visually. In ER, attribute domains can be listed in another document.

In UML class diagrams, attribute domains may be listed after the attribute name and multiplicity (if shown), as in Figure 1.7(c). The “[0..1]” multiplicity indicates “at most one”, so the attribute is optional and single-valued. All too often in practice, only syntactic, or value, domains are specified (e.g., String).

An ER diagram might show population and elevation as attributes of City, and an associated table might list the domains of these attributes simply as Integer, despite the fact that it is nonsense to equate a population with an elevation.

Conceptual object types, or semantic domains, provide the conceptual “glue” that binds the various components in the application model into a coherent picture. Even at the lower level of the relational data model, E.F. Codd, the founder of the relational model, argues that “domains are the glue that holds a relational database together” (Codd 1990, p. 45).

The object types in ORM diagrams are the semantic domains, so the connectedness of a model is transparent. This property of ORM also has significant advantages for conceptual queries, since a user can query the conceptual model directly by navigating through its object types to establish the relevant connections. This notion is elaborated further in later chapters.

ER and UML diagrams often fail to express relevant constraints on, or between, attributes. Figure 1.8 provides a simple example. Notice the circled dot over an “X” in the ORM model in Figure 1.8(a). This specifies two constraints: the dot is a mandatory constraint over the disjunction of the two roles (each truck is either bought or leased) and the “X” indicates the roles are exclusive (no truck is both bought and leased). The two constraints collectively provide an xor (exclusive-or) constraint (each truck plays exactly one of the roles).

Figure 1.8. (a) ORM model, (b) UML model, revealing less detail.

Unlike most versions of ER, UML does provide an xor constraint, but only between associations. Since the UML model in Figure 1.8(b) models these two fact types as attributes instead of associations, it cannot capture the constraint graphically (other than adding a note).

Notice again how the ORM diagram reveals the semantic domains. For instance, tare may be meaningfully compared with maximum load (both are masses) but not with length. In UML this can be made explicit by appending domain names to the attributes. At various stages in the modeling process, it is helpful for the modeler to see all the relevant information in the one place.

Another ORM feature is its flexible support for subtyping, including multiple inheritance, based on formal subtype definitions. For example, the subtype LargeUSCity may be defined as a City that is in Country ‘US’ and has a Population > 1000000. As discussed in a later chapter, subtype definitions provide stronger constraints than declarations about whether subtypes are exclusive or exhaustive.

In principle, because there are infinitely many kinds of constraints, a textual constraint language is often required for completeness to supplement the diagram. This is true for ER, ORM, and UML models. However, the failure of ER and UML diagrams to include standard notations for many important ORM constraints makes it harder to develop a comprehensive model or to perform transformations on the model.

For example, suppose that in any movie an actor may have a starring role or a supporting role but not both. This can be modeled by two fact types: Actor has starring role in Movie; Actor has supporting role in Movie. The “but not both” condition is expressed in ORM as a pair-exclusion constraint between the fact types. Alternatively, these fact types may be replaced by a single longer fact type: Actor in Movie has role of RoleKind {star, support}.

Transformations are rigorously controlled in ORM to ensure that constraints in one representation are captured in an alternative representation. For instance, the pair-exclusion constraint is transformed into the constraint that each Actor-Movie pair has only one RoleKind. The formal theory behind such transformations is easier to apply when the relevant constraints can be visualized.

Unlike UML and ER, ORM was built from a linguistic basis. To reap the benefits of verbalization and population for communication with and validation by domain experts, it’s better to use a language that was designed with this in mind. The QRM notation is easy to learn and has been successfully taught even to high school students.

We are not arguing here that ER and UML have no value. They do. We are simply suggesting that you consider using ORM’s modeling techniques, and possibly its graphic notation, to facilitate your original conceptual analysis before using an attribute-based notation such as that of ER, UML, or relational tables.

Once you have validated the conceptual model with the domain expert, you need to map it to a DBMS or program code for implementation. At this lower level, you will want to use an attribute-based model, so that you have a compact picture of how facts are grouped into implementation structures. For database applications, you will want to see the table structures, foreign key relationships, and so on. Here a relational or object-relational model offers a compact view, similar to an ER or UML model.

ORM models often take up much more space than an attribute-based model, since they show each attribute as a relationship. This is ideal for conceptual analysis, where we should validate one fact type at a time. However, for logical design, we typically group facts into attribute-based structures such as tables or classes. At the logical design stage, attribute-based models are more useful than ORM models. For example, relational schema diagrams provide a simple, compact picture of the underlying tables and foreign key constraints between them. Also, UML is well suited for the logical and physical design of object-oriented code, since it allows implementation detail on the data model (e.g., attribute visibility and association navigation) and can be used to model behavior and deployment.

Having used ER, ORM, and UML in practice, we’ve found that ORM often makes it easier to get the model right in the first place and to change the model as the business domain evolves. We believe in the method so strongly that we’ve made it the basis for much of the modeling discussion in this book. Once you understand ORM’s modeling principles, you’ll find it much easier to gain a proper understanding of data modeling in ER and UML.

Arguments about modeling approaches can become heated, and not everyone is as convinced of the virtues of ORM as we are. All we ask is that you look objectively at the ideas presented in this book and consider using whatever you find helpful.

Although the book focuses on ORM, it also covers data modeling in other popular notations (e.g., ER, IDEF1X, UML, and relational). These other notations have value too. Even if you decide to stay with ER or UML as your conceptual analysis approach, an insight into ORM should make you a better modeler regardless.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Modeling Approaches

Create new playlist

Sign In

Sign Up