Determining Data Relationships

Relationships among multiple sources of input data exist when each source contains common data, either at the physical or logical level. For example, employee data and department data could be related through an employee ID variable that shares common values. Another data set could contain numeric sequence numbers whose partial values logically relate it to a separate data set by observation number. Once data relationships exist, they fall into one of four categories:

  • one-to-one

  • one-to-many

  • many-to-one

  • many-to-many

The categories are characterized by how observations relate among the data sets. All related data fall into one of these categories. You must be able to identify the existing relationships in your data because this knowledge is crucial to understanding how input data can be processed to produce desired results.

The remainder of this section illustrates the four categories.

One-to-One

In a one-to-one relationship, typically a single observation in one data set is related to a single observation from another based on the values of one or more selected variables. A one-to-one relationship implies that each value of the selected variable occurs no more than once in each data set. When working with multiple selected variables, this relationship implies that each combination of values occurs no more than once in each data set. Figure 1.1 presents an example of two data sets with a one-to-one relationship.

Figure 1.1. One-to-One

Observations in SALARY and TAXES are related by common values for EMPNUM.

One-to-Many and Many-to-One

A one-to-many or many-to-one relationship between input data sets implies that one data set has at most one observation with a specific value of the selected variable, but the other input data set might have more than one occurrence of each value. When working with multiple selected variables, this relationship implies that each combination of values occurs no more than once in one data set, but might occur more than once in the other data set. The order in which the input data sets are processed determines whether the relationship is one-to-many or many-to-one. Figure 1.2 presents an example of two data sets with a one-to-many relationship.

Figure 1.2. One-to-Many

Observations in PATIENTS and APPOINTMENTS are related by common values for ID.

Figure 1.3 presents an example of three related data sets. Data sets AGENTS and SALES have a one-to-many relationship. Data sets SALES and QUOTA have a many-to-one relationship.

Figure 1.3. One-to-Many and Many-to-One

Observations in data sets AGENTS, SALES, and QUOTA are related by common values for variable ID. Values of ID are unique in AGENTS and QUOTA, but not in SALES. For ID values HE01 and HH01, a one-to-many relationship exists between observations in data sets AGENTS and SALES, and a many-to-one relationship exists between observations in data sets SALES and QUOTA. Additionally, a one-to-one relationship exists between AGENTS and QUOTA.

Many-to-Many

The many-to-many relationship implies that multiple observations from each input data set might be related based on values of one or more common variables. Figure 1.4 presents an example of two data sets with a many-to-many relationship.

Figure 1.4. Many-to-Many

Observations in data sets REPAIRS and MAINTENANCE are related by common values for variable VEHICLE. Values of VEHICLE are not unique in either data set. A many-to-many relationship exists between observations in these data sets for values 139 and 593 of VEHICLE.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.164.243