2.9. Data Dictionary

Working Models

We stress again and again that the best way to specify a system is to build a working model of it. A working model is one that demonstrates each process in the data flow diagram is capable of manufacturing its outputs from its inputs, and that each entity and relationship in the data model can supply or store the data needed by all the processes.

What exactly are the inputs and outputs and entities? So far, you have used meaningful names to indicate the content of the flows, but have trusted that each of the data flows in the model somehow contains whatever is needed. You have also believed that the data stores and entities contain whatever data needed. They probably do, but you must prove it.

The data dictionary is the part of the model that provides definitions of the data flows, data elements, stores, entities, and relationships. After you have built the dictionary, you know that you have a precise understanding of the data, and that your model really works.

The Meaning of Your Data

The definitions in the data dictionary serve the same purpose as the definitions in any dictionary. If you need to know the meaning of a word, you look it up in the dictionary to find the precise meaning of the word. It is exactly the same with the analysis data dictionary. You define all your flows and stores in the dictionary, and your readers use the dictionary to find the exact meaning of any data component of your model.

Let’s look at why this is important. Consider the fragment of a data flow model in Figure 2.9.1. This is a common enough process, found in the many thousands of order processing systems around the world. Despite the everyday nature of the process, and although most people working with business systems have seen quite a few examples of order forms, what are the exact contents of the data flow ORDER FORM in this particular system? Experience tells us that an order form usually contains some customer information, something about the purchased goods, and probably some pricing information. This seems like a reasonable description of an order form, but it is not at all precise. A specification that quoted “some customer information, something about the purchased goods, and probably some pricing information” does not provide the system implementors with what they need. Different programmers would write different programs producing different order forms. If it is to be useful, a specification must be exact.

Image

Figure 2.9.1: This process receives a request by telephone from a customer. It checks the stock availability, and produces an order form that is sent to the warehouse for dispatch.

Suppose that instead of the vague sentence quoted above, you had a definition that looked like this:

Order Form = Customer Ident + Date Of Order
+ Time Of Order + Telephone Clerk Ident
+ {Item Code + Quantity Ordered + Unit Price
+ (Discount Code)}
+ Requested Dispatch Date

With this data dictionary entry, there is only one possible interpretation of the data flow. You write it like this so that you, the users, and the implementor all share a precise and identical understanding of the data.

Defining Terms in the Context

When you read the above definition of the order form, did you think that it was different from order forms you’ve seen before? It probably was. There is no universal law that decrees the contents of order forms. In different systems, “order form” means different things. So you must define what an order form means within the context of this system.

Consider the following illustration of context from S.I. Hayakawa’s classic book Language in Thought and Action:

Let us see how dictionaries are made and how the editors arrive at definitions. What follows applies, incidentally, only to those dictionary offices where first-hand, original research goes on—not those in which editors simply copy existing dictionaries. The task of writing a dictionary begins with reading vast amounts of the literature of the period or subject that the dictionary is to cover. As the editors read, they copy on cards every interesting or rare word, a large number of common words in their ordinary uses, and also the sentences in which each of these words appears, thus:

pail

The dairy pails bring home increase of milk

Keats, Endymion
I, 44-45

That is to say, the context of each word is collected, along with the word itself. For a really big job of dictionary writing, such as the Oxford English Dictionary (usually bound in about twenty-five volumes), millions of such cards are collected, and the task of editing occupies decades. As the cards are collected, they are alphabetized and sorted. When the sorting is completed, there will be for each word anywhere from two or three to several hundred illustrative quotations, each on its own card.

To define a word, then, the dictionary editor places before him the stack of cards illustrating that word; each of the cards represents an actual use of the word by a writer of some literary or historical importance. He reads the cards carefully, discards some, rereads the rest, and divides up the stack according to what he thinks are the several senses of the word. Finally, he writes his definitions, following the hard-and-fast rule that each definition must be based on what the quotations in front of him reveal about the meaning of the word. The editor cannot be influenced by what he thinks a given word ought to mean. He must work according to the cards or not at all.*

*From Language in Thought and Action by S.I. Hayakawa. Copyright © 1963, 1964 by Harcourt, Brace & World, Inc. Reproduced with permission by George Allen & Unwin Ltd.

Hayakawa is talking about writing dictionaries using “first-hand, original research.” This applies to you, too: The analysis data dictionary is certainly an original dictionary. Note what Hayakawa says about the editor working within the context of the period or subject of the dictionary and not attempting to influence it by his own opinion. This is precisely what you must do as well: You must not assign meanings that the term had in some previous system, nor give your opinion as to what the term ought to mean. The data dictionary entries define the meaning of data within the context of the system being studied.

Problems of Accuracy

Whenever a document appears in the model as a data flow, instead of writing data dictionary definitions, you might be tempted to include a sample of a document in the specification. This turns out to be a bad idea in the long term because the document focuses on the physical implementation of the data, rather than the meaning behind the implementation.

Since the document is designed for the current system, it may not be suitable for any new implementation. The document focuses on the solution to a problem and often hides the real problem. To understand the problem, you need to see the pure data. You must write definitions that ignore both the medium and any data that are dependent on the medium. In other words, you write a description of only the information content of the document.

There is always a strong possibility that a document can be misunderstood: Columns are often headed with cryptic labels; the document may contain redundant information; most documents contain at least one example of ambiguity; and often the design of a document imposes an unintended meaning. The data dictionary definition, however, is written in a manner that cannot be misunderstood. That means you need a suitable notation.

Notation

To specify the data flows and stores, you define their data content by using these operators:

= anything after this is combined to make up a flow/store/entity

+ shows the combination of components

For example,

Applicant Registration = Applicant Name + Applicant Address
+ Applicant Date Of Birth + Applicant History

This data dictionary entry means that each time the model refers to APPLICANT REGISTRATION, it means the combination of the applicant’s name, address, date of birth, and history.

The dictionary also shows

Applicant Address = House Number + Street Name
+ (Apartment Number)
+ City + State + Zip

The parentheses enclosing APARTMENT NUMBER indicate that this data item is optional, or that every APPLICANT ADDRESS doesn’t have to include it. For example, most people living in New York City have an apartment number in their address, but residents of Pooletown, Oklahoma (population 420), do not use apartment numbers, as everybody lives in a house.

Some data items need to occur a number of times:

{Anything enclosed in braces repeats}

For example,

Applicant History = (Current Job)
+ {Previous Job}

This entry says that an applicant may have a current job, and some number of previous ones. Let’s say the system is only interested in the three most recent previous jobs. Additionally, let’s say that the company policy is that it doesn’t consider applicants who have never worked. You could then write this definition:

Applicant History = (Current Job)
+ 1:3 {Previous Job}

The cardinal operators (1:3 preceding the braces) add information to the definition. It now says that every applicant registration must have had at least one previous job, and not more than three. However, the restriction of three previous jobs is there for policy reasons. When the company interviews applicants, it doesn’t take into account any but the last three jobs. The cardinal operators have no bearing on the physical capacity of the current form. Maybe the current form can only accommodate two jobs, so the applicants can overcome this by writing the third job on the back of the form. The logical limit is still three, even though the physical limit of the form is two.

When cardinal operators are not shown (which is the usual case), the lower limit is zero, and the upper limit is some undetermined number. For example, consider this entry:

Applicant Name = {Forename}
+ Surname

The business policy allows the applicant to have no forename at all or as many as he wants. While this may at first appear to be silly, in the context of this system, it is reasonable. Applicants will have a surname and probably, but not necessarily, one or more forenames. There is no upper limit to the number of forenames, as members of the English and Spanish aristocracies have a considerable and unpredictable number of forenames.

The current policy yields the definition 1:3 {Previous Job}. You would write 1: {Previous Job} if the system considers all the applicant’s previous jobs; or :3 {Previous Job} if the system relaxes the rule prohibiting first-time workers, but retains the upper limit.

Use cardinal operators only when you have something special to say. Usually, the context of the definition makes the operators apparent to your readers.

On some occasions, you need to make a selection between data items. Let’s say you are analyzing a system that sends a newsletter to its customers. The customers may elect to have their mail sent to either their home address or the office. The notation for selection is

[1] The square brackets enclose a number of choices. The choices are separated by a I. It says [pick this | or this | or this].

The data dictionary entry for the address reads

Mailing Address = [Home Address | Office Address]

Enlightening notes in the data dictionary are occasionally useful.

* Comments are enclosed by asterisks *

For example, the analyst may add

Mailing Address = [Home Address | Office Address]
* Home addresses are preferred *

Strictly speaking, this comment is not necessary as it is not defining data. However, it probably saves more time than it wastes, so you may think of it as a beneficial surplus. Please don’t make indiscriminate use of comments. They should only add information that is of interest to the analysis.

Further Decomposition

Some definitions include items that need further breakdown. For instance, consider this entry:

Applicant Registration = Applicant Name + Applicant Address
+ Applicant Date Of Birth + Applicant History

Any component that is itself made up of components must be further decomposed. Thus,

Applicant Name = {Forename}
+ Surname

Applicant Address = House Number + Street Name
+ (Apartment Number)
+ City Name + State Name + Zip Code

Applicant History = (Current Job)
+ 1:3 {Previous Job}

Keep breaking these components down until you reach a primitive level. Such a primitive is called a data element.

Data Elements

A data item is primitive if, within your context of study, you can give it a value. Therefore, it is something you can’t reasonably break down any further. For example, the definition given above for the applicant’s address includes a house number. From the analyst’s point of view, the house number is a primitive item of data, and therefore has the definition:

House Number = * Data element *

Of course, you could define house number as

House Number = 1:6 {Digit}

But this definition provides no real enlightenment. Any numerical piece of data can be defined as {Digit}. The values for this data element are called continuous. There is an almost infinite variety of numbers that identify houses. Since most people understand this, there is nothing more of interest to say about house numbers. Once you define it as * Data element *, you have nothing more to add. You and your users have no direct concern with the size of the item, and that implementation detail can be safely left until later.

There is an exception to the rule. Some data elements have a strictly limited number of possible values. These are called discrete elements, and their possible values are shown in your dictionary. For example, this book was written on three Macintosh® computers, which are linked by networking software that allows any of the three to be file servers. An appropriate definition looks like this:

Server Id = * Data element *
[“Ian” | “James” | “Suzanne”]

If the values of a data element are defined elsewhere, you can use the data dictionary entry to point to the defined values:

Country Dialing Code = * Data element. See booklet supplied by telephone company.*

You can also use data dictionary comments to record a data element’s unit of measure:

Mountain Height = * Data element. Height above sea level measured in feet. *

Defining Calculations

Mathematical calculations are traditionally placed in the mini specifications. However, many analysts have discovered the convenience of defining algorithms in the data dictionary using the comment notation:

Total Floor Area = * Data element. Calculate as room length times room width. Units: square meters. *

This definition tells us that TOTAL FLOOR AREA is a data element that can be calculated by the algorithm. If the calculation of TOTAL FLOOR AREA is referred to in more than one process, it makes sense to define the algorithm in the data dictionary rather than duplicating it in many mini specifications.

Defining Data Stores, Entities, and Relationships

Some of the most important data in the system are stored. Most of the data that enter the system through the incoming boundary data flows end up in a data store. Figure 2.9.2 illustrates a data model that was derived from a data store. The data dictionary should contain definitions of the data stores, as well as the data model’s entities and relationships.

Image

Figure 2.9.2: To complete the data dictionary that supports your data flow models, define all the stores and the flows. Similarly, to complete your data model, define all the entities and relationships.

Applicant = Applicant Name + Applicant Date Of Birth + Applicant Address
+ Date Registered + Salary Required

Applicant Attributes = Applicant Registration

Applicant Registration = Applicant Name + Applicant Address
+ Applicant Date Of Birth + Current Job + Salary Required
+ {Previous Job}

Applicant Store = {Applicant Name + Applicant Date Of Birth
+ Applicant Address + Date Registered + Salary Required
+ Current Job
+ {Previous Job}
+ Interview Comments}

Discussion = * Relationship. Cardinality: for each Applicant, there are many
Interviews; for each Interview, there is one Applicant. *

Employment = * Relationship. Cardinality: for each Applicant, there are many
Jobs; for each Job, there is one Applicant. *

Interview = Interview Session Number + Interview Comments

Interview Comments = * Free text comments on the applicant and his/her
attitudes and aspirations *

Job = Type Of Work + Employer Name + Date Started + Date Ended

Rejected Application = Applicant Name + Applicant Address
+ Reasons For Rejection

Definitions of data stores, entities, and relationships are not fundamentally different from definitions of flows. However, there are several special differences to note.

First, definitions of data stores are enclosed by braces { } because the data in a store repeat. In our example, the applicant file holds information about several hundred people.

Second, if an occurrence of the store, entity, or relationship has to be identified, its unique identifiers are underlined and, by convention, placed first in the list of attributes. For example, in Figure 2.9.2, each interview carries a unique INTERVIEW SESSION NUMBER. Note how a composite identifier (a combination of several data elements) identifies the APPLICANT and JOB entities.

Third, the relationships in Figure 2.9.2 do not contain data, so the data dictionary records them with only a comment, just like a data element. Naturally, if they held data, you would define the data in the same way as if the relationship were a regular data store. Relationships are formed whenever there is a business need to remember the association between entities. Some analysts use the dictionary entry to record the business role of the relationship, including why it is necessary, and what conditions must exist with the participating entities to form the relationship.

Note the correlation between the data store APPLICANT STORE and the data model. A rule of thumb is to form each repeating group into an entity that relates to whatever came before the group. Because of this convention, some analysts write data dictionary entries for all the stores before attempting their data models. However, this heuristic only works when the original data stores have already been logically partitioned.

What Do You Put in the Data Dictionary?

The answer to this question is

EVERYTHING

Every piece of data that the system uses must be recorded in the data dictionary. This means every data flow, data store, entity, relationship, and data element has an entry in the dictionary. Until the dictionary is complete, the model is not complete.

We have another answer to the same question:

NOT EVERYTHING

You must restrict your data dictionary to information that is useful to the analysis. This means the dictionary shows the composition of the data flows, data stores, entities, and relationships, and the meaning of each data element. Information about the physical implementation of the data is not important to the analysis. Remember that you are writing the requirements for the system—not designing its implementation. The requirement is for a data element to exist; it is not for the element to have seven numeric digits with two decimal places, nor is it a requirement to note that the element is currently held in packed binary format.

This information may well be needed later by the implementors and database designers. If you include all such information in your dictionary now, you will add unnecessary clutter. During analysis, concern yourself with what the system has to do. While you are still gathering the requirements for the system, you are hardly in the best position to decide the implementation. After analysis, when you know all the requirements, you can determine the best way to implement the data, and then it will be appropriate to add the design decisions to the data dictionary.

Aliases

Aliases are found in the dictionary when several names exist for the same item, or when several analysts inadvertently use different names for the same item. For example, suppose there is a collection of data elements that some people call “Buying instruction” and others call “Purchase requisition.” Defining this item twice introduces redundancy into the dictionary and makes the specification misleading. How do you treat aliases?

You must first decide which of the names is more descriptive, and then write a pair of definitions. One definition is for the preferred name, like so:

Buying Instruction = * Alias Purchase Requisition *
Proposed Supplier Ident + Required Delivery Date
+ Ordering Authorization
+ {Item Required + ...

The entry for the other term is as follows:

Purchase Requisition = * See Buying Instruction *

In the example, the * Alias Purchase Requisition * comment acknowledges the existence of another name, but anybody looking up that name is referred to BUYING INSTRUCTION for the full definition.

Avoid aliases whenever possible by encouraging people to adopt standard naming conventions. This is worthwhile, as the existence of synonyms makes the dictionary more difficult to manage. You must always check before writing a new definition that your intended entry does not already exist in the dictionary under another name. This is easier if you compare your data dictionary definitions with other analysts who are modeling the same data, or who have data flows or data stores that interface with your models. Since the data dictionary tightly connects the data model and the process models, this tool should be used to control data names and prevent aliases, especially at the data element level.

Summary

The data dictionary adds a strengthening element to your data flow models. Your dictionary defines all the data shown in your diagrams, and with this knowledge of the data, you should be able to prove that each process can produce its outputs from its inputs. The data dictionary also completes your data model. By defining its entities and relationships, you now have a complete understanding, as well as a written description, of the system’s stored data.

It’s a good idea to think of the data dictionary as something that you build in parallel with the other models. Don’t wait until the other models are almost complete before starting the data dictionary. Waiting means that you must write several hundred, possibly several thousand, dictionary entries all at one time—a tedious task, to say the least. Also, if you postpone writing a definition, you risk forgetting its meaning and you lose the greatest benefit of the dictionary: forcing you to think about the data and to understand more precisely how each of the processes, entities, and relationships uses the data. When you understand all that, you understand the system.

Now let’s write some definitions.

Exercises

Here is a reminder of the data dictionary operators:

= is made up of

+ and

[Select this | or this | or this]

{This item is repeating}

(This one is optional)

* Here is a comment *

Write data dictionary entries for the following:

1. Define the title page of this book. (That’s the one with “Dorset House Publishing” near the bottom of the page.)

2. Assume this statement is correct: “The selling price for a line in the order form can be derived in a number of ways. Sometimes, a salesman has negotiated a special price. If a discount rate has been written on the form, then a discount has been given to the customer. The discounted price is calculated by subtracting the discounted standard price from the standard price.” Use the following definition for a line on the order form:

Order Line = Product Description + Product Code + Pack Size
+ Quantity Ordered + Quantity Sent
+ (Discount Kate)
+ Selling Price

Write the data dictionary entry for SELLING PRICE.

3. Within the context of your system, a person’s identity is made up of his name, address, and date of birth. If a social security number is available, that alone is sufficient as an identifier. Write the definition for PERSONAL IDENTITY.

4. The client identifier is the client’s name, or an acronym. Sometimes, both are used. Define CLIENT IDENTIFIER.

5. One line of an invoice shows a description of the product, the quantity sold, and the selling price that is an undiscounted price or a discounted price or a special price plus handling. Define INVOICE LINE and SELLING PRICE.

6. An application shows the applicant’s name, address, and telephone number. The telephone number is written as an area code, the number, and an extension if he has one. Not all applicants have a telephone number. Define APPLICATION.

7. When traditional breweries make deliveries of ale to English pubs, they often use container sizes that are part of the folklore of British drinking. The containers are called pins, firkins, hogsheads, barrels, and puncheons. Assume that each pub on the delivery route gets several containers that are listed on a BREWERY DELIVERY NOTE. The containers delivered to one pub are not necessarily all the same size. For background information, a pin holds 4 gallons, a firkin 8, a hogshead 16, a barrel 32, and a puncheon 64. Define BREWERY DELIVERY NOTE.

All data stores shown in the data flow diagram must be defined in the data dictionary. In exercises 8 and 9 are some descriptions of data stores. Write the data dictionary entries for each data store.

8. A consulting company maintains a file of client information. Each client has a record in the file that is made up of the client acronym and the client name. Following these is a number of jobs that are active for that client. Each job must have a unique identity for the job, the type of job, and the start date. The name of the main contact appears after all the jobs for that client. Define CLIENT FILE.

9. The Regional Theater Casting System that you saw in Chapter 2.8 Current Physical Viewpoint keeps information about actors in a card file. There is a green card for each actor showing his name, address, and date of birth. Following each green card are some more cards. There are white ones that relate to the parts the actor has applied for, and yellow ones for the parts that the actor has actually played during the past ten years. Both cards carry a part classification, and every actor has at least one white and one yellow card.

Each yellow card has the date started and the date ended for each part, the salary, the producer, and any reviews by the critics. Each white card shows the producer who offered the part, the date on which the successful actor is expected to start (the “start date”), and the salary offered, together with a short description of the role (“determined youngest daughter suppressed by tyrannical family,” “bashful young man secretly in love with the boss’s daughter,” and so on). The casting agency keeps actors on file even when they are not in the process of applying for a part. Define this ACTOR FILE.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.20.20