9.3. Attributes

Like other ER notations, UML allows relationships to be modeled as attributes. For instance, in Figure 9.6(a) the Employee class has eight attributes. The corresponding ORM diagram is shown in Figure 9.6(b).

Figure 9.6. UML attributes (a) depicted as ORM relationship types (b).


In UML, attributes are mandatory and single valued by default. So the employee number, name, title, gender, and smoking status attributes are all mandatory. In the ORM model, the unary predicate “smokes” is optional (not everybody has to smoke). UML does not support unary relationships, so it models this instead as the Boolean attribute “isSmoker”, with possible values True or False. In UML the domain (i.e., type) of any attribute may optionally be displayed after it (preceded by a colon). In this example, the domain is displayed only for the isSmoker attribute. By default, ORM tools usually take a closed world approach to unaries, which agrees with the isSmoker attribute being mandatory.

The ORM model also indicates that Gender and Country are identified by codes (rather than names, say). We could convey some of this detail in the UML diagram by appending domain names. For example, “Gendercode” and “Countrycode” could be appended to “gender:” and “birthcountry:” to provide syntactic domains.

In the ORM model it is optional whether we record birth country, social security number, or passport number. This is captured in UML by appending [0..1] to the attribute name (each employee has 0 or 1 birth country, and 0 or 1 social security number). This is an example of an attribute multiplicity constraint. The main multiplicity cases are shown in Table 9.2. If the multiplicity is not declared explicitly, it is assumed to be 1 (exactly one). If desired, we may indicate the default multiplicity explicitly by appending [1..1] or [1] to the attribute.

Table 9.2. Multiplicities.
MultiplicityAbbreviationMeaningNote
0..1 0 or 1 (at most one) 
0..**0 to many (zero or more) 
1..11exactly 1Assumed by default
1..* 1 or more (at least 1) 
n..* n or more (at least n)n≥0
n..m at least n and at most mm > n≥0

In the ORM model, the uniqueness constraints on the right-hand roles (including the EmployeeNr reference scheme shown explicitly earlier) indicate that each employee number, social security number, and passport number refer to at most one employee. As mentioned earlier, UML has no standard graphic notation for such “attribute uniqueness constraints”, so we’ve added our own {P} and {Un} notations for preferred identifiers and uniqueness. UML 2 added the option of specifying {unique} or {nonunique} as part of a multiplicity declaration, but this is only to declare whether instances of collections for multivalued attributes or multivalued association roles may include duplicates, so it can’t be used to specify that instances of single valued attributes or combinations of such attributes are unique for the class.

UML has no graphic notation for an inclusive-or constraint, so the ORM constraint that each employee has a social security number or passport number needs to be expressed textually in an attached note, as in Figure 9.6(a). Such textual constraints may be expressed informally, or in some formal language interpretable by a tool. In the latter case, the constraint is placed in braces.

In our example, we’ve chosen to code the inclusive-or constraint in SQL syntax. Although UML provides OCL for this purpose, it does not mandate its use, allowing users to pick their own language (even programming code). This of course weakens the portability of the model. Moreover, the readability of the constraint is typically poor compared with the ORM verbalization.

The ORM fact type Employee was born in Country is modeled as a birthcountry attribute in the UML class diagram of Figure 9.6(a). If we later decide to record the population of a country, then we need to introduce Country as a class, and to clarify the connection between birthcountry and Country we would probably reformulate the birthcountry attribute as an association between Employee and Country. This is a significant change to our model. Moreover, any object-based queries or code that referenced the birthcountry attribute would also need to be reformulated. ORM avoids such semantic instability by always using relationships instead of attributes.

Another reason for introducing a Country class is to enable a listing of countries to be stored, identified by their country codes, without requiring all of these countries to participate in a fact. To do this in ORM, we simply declare the Country type to be independent. The object type Country may be populated by a reference table that contains those country codes of interest (e.g., ‘AU’ denotes Australia).

A typical argument in support of attributes runs like this: “Good UML modelers would declare country as a class in the first place, anticipating the need to later record something about it, or to maintain a reference list; on the other hand, features such as the title and gender of a person clearly are things that will never have other properties, and hence are best modeled as attributes”. This argument is flawed. In general, you can’t be sure about what kinds of information you might want to record later, or about how important some model feature will become.

Even in the title and gender case, a complete model should include a relationship type to indicate which titles are restricted to which gender (e.g., “Mrs”, “Miss”, “Ms”, and “Lady” apply only to the female sex). In ORM this kind of constraint can be captured graphically as a join-subset constraint or textually as a constraint in a formal ORM language (e.g., If Person1 has a Title that is restricted to Gender1 then Perso1 is of Gender1). In contrast, attribute usage hinders expression of the relevant restriction association (try expressing and populating this rule in UML).

ORM includes algorithms for dynamically generating ER and UML diagrams as attribute views. These algorithms assign different levels of importance to object types depending on their current roles and constraints, redisplaying minor fact types as attributes of the major object types. Modeling and maintenance are iterative processes. The importance of a feature can change with time as we discover more of the global model, and the domain being modeled itself changes.

To promote semantic stability, ORM makes no commitment to relative importance in its base models, instead supporting this dynamically through views. Elementary facts are the fundamental units of information, are uniformly represented as relationships, and how they are grouped into structures is not a conceptual issue. You can have your cake and eat it too by using ORM for analysis, and if you want to work with UML class diagrams, you can use your ORM models to derive them.

One way of modeling this in UML is shown in Figure 9.7(a). Here the information about who plays what sport is modeled as the multivalued attribute”sports”. The “[0..*]” multiplicity constraint on this attribute indicates how many sports may be entered here for each employee. The “0” indicates that it is possible that no sports might be entered for some employee. UML uses a null value for this case, just like the relational model. The presence of nulls exposes users to implementation rather than conceptual issues and adds complexity to the semantics of queries. The “*” in “[0..*]” indicates there is no upper bound on the number of sports of a single employee. In other words, an employee may play many sports, and we don’t care how many. If “*” is used without a lower bound, this is taken as an abbreviation for “0..*”.

Figure 9.7. (a) Multivalued UML sports attribute depicted as (b) ORM m:n fact type.


An equivalent ORM schema is shown in Figure 9.7(b). Here an optional, many:many fact type is used instead of the multivalued sports attribute. As discussed in the next section, this approach may also be used in UML using an m:n association.

To discuss class instance populations, UML uses object diagrams. These are essentially class diagrams in which each object is shown as a separate instance of a class, with data values supplied for its attributes. As a simple example, Figure 9.8(a) includes object diagrams to model three employee instances along with their attribute values. The ORM model in Figure 9.8(b) displays the same sample population, using fact tables to list the fact instances.

Figure 9.8. Populated models in (a) UML and (b) ORM.


For simple cases like this, object diagrams are useful. However, they rapidly become unwieldy if we wish to display multiple instances for more complex cases. In contrast, fact tables scale easily to handle large and complex cases.

ORM constraints are easily clarified using sample populations. For example, in Figure 9.8(b) the absence of employee 101 in the Plays fact table clearly shows that playing sport is optional, and the uniqueness constraints mark out which column or column-combination values can occur on at most one row. In the EmployeeName fact table, the first column values are unique, but the second column includes duplicates. In the Plays table, each column contains duplicates: only the whole rows are unique. Such populations are very useful for checking constraints with the subject matter experts. This validation-via-example feature of ORM holds for all its constraints, not just mandatory roles and uniqueness, since all its constraints are role-based or type-based, and each role corresponds to a fact table column.

As a final example of multivalued attributes, suppose that we wish to record the nicknames and colors of country flags. Let us agree to record at most two nicknames for any given flag and that nicknames apply to only one flag. For example, “Old Glory” and perhaps “The Star-spangled Banner” might be used as nicknames for the United States flag. Flags have at least one color.

Figure 9.9(a) shows one way to model this in UML. The “[0..2]” indicates that each flag has at most two (from zero to two) nicknames. The [“1..*] declares that a flag has one or more colors. An additional constraint is needed to ensure that each nickname refers to at most one flag. A simple attribute uniqueness constraint (e.g., {Ul}) is not enough, since the nicknames attribute is set valued. Not only must each nicknames set be unique for each flag, but each element in each set must be unique (the second condition implies the former). This more complex constraint is specified informally in an attached note.

Figure 9.9. A flag model in (a) UML and (b) ORM.


Here the attribute domains are hidden. Nickname elements would typically have a data type domain (e.g., String). If we don’t store other information about countries or colors, we might choose String as the domain for country and color as well (although this is subconceptual, because real countries and colors are not character strings). However, since we might want to add information about these later, it’s better to use classes for their domains (e.g., Country and Color). If we do this, we need to define the classes as well.

Figure 9.9(b) shows one way to model this in ORM. For verbalization we identify each flag by its country. Since country is an entity type, the reference scheme is shown explicitly (reference modes may abbreviate reference schemes only when the referencing type is a value type). The “≤ 2” frequency constraint indicates that each flag has at most two nicknames, and the uniqueness constraint on the role of NickName indicates that each nickname refers to at most one flag.

UML gives us the choice of modeling a feature as an attribute or an association. For conceptual analysis and querying, explicit associations usually have many advantages over attributes, especially multivalued attributes. This choice helps us verbalize, visualize, and populate the associations. It also enables us to express various constraints involving the “role played by the attribute” in standard notation, rather than resorting to some nonstandard extension. This applies not only to simple uniqueness constraints (as discussed earlier) but also to other kinds of constraints (frequency, subset, exclusion, etc.) over one or more roles that include the role played by the attribute’s domain (in the implicit association corresponding to the attribute).

For example, if the association Flag is of Country is depicted explicitly in UML, the constraint that each country has at most one flag can be captured by adding a multiplicity constraint of “0..1” on the left role of this association. Although country and color are naturally conceived as classes, nickname would normally be construed as a data type (e.g., a subtype of String). Although associations in UML may include data types (not just classes), this is somewhat awkward; so in UML, nicknames might best be left as a multivalued attribute. Of course, we could model it cleanly in ORM first.

Another reason for favoring associations over attributes is stability. If we ever want to talk about a relationship, it is possible in both ORM and UML to make an object out of it and simply attach the new details to it. If instead we modeled the feature as an attribute, we would need to first replace the attribute by an association. For example, consider the association Employee plays Sport in Figure 9.8(b). If we need to record a skill level for this play, we can simply objectify this association as Play,and attach the fact type: Play has SkillLevel. A similar move can be made in UML if the play feature has been modeled as an association. In Figure 9.8(a) however, this feature is modeled as the sports attribute, which needs to be replaced by the equivalent association before we can add the new details about skill level. The notion of objectified relationship types or association classes is covered in a later section.

Another problem with multivalued attributes is that queries on them need some way to extract the components, and hence complicate the query process for users. As a trivial example, compare queries Ql, Q2 expressed in ConQuer (an ORM query language) with their counterparts in OQL (the Object Query language proposed by the ODMG). Although this example is trivial, the use of multivalued attributes in more complex structures can make it harder for users to express their requirements.

(Q1)List each Color that is of Flag ‘USA’.
(Q2)List each Flag that has Color ‘red’.
(Q1 a)select x.colors from x in Flag where x.country = “USA”
(Q2a)select x.country from x in Flag where “red” in x.colors

For such reasons, multivalued attributes should normally be avoided in analysis models, especially if the attributes are based on classes rather than data types. If we avoid multivalued attributes in our conceptual model, we can still use them in the actual implementation. Some UML and ORM tools allow schemas to be annotated with instructions to override the default actions of whatever mapper is used to transform the schema to an implementation. For example, the ORM schema in Figure 9.9 might be prepared for mapping by annotating the roles played by NickName and Color to map as sets inside the mapped Flag structure. Such annotations are not a conceptual issue, and can be postponed until mapping.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.167.114