Reviewing Terminology

Overview

Before you examine the various techniques for combining data horizontally, here is a review of some of the terminology that this chapter uses.
Term
Definition
combining data horizontally
A technique in which data is retrieved from an auxiliary source or sources, based on the values of variables in the primary source.
performing a table lookup
A technique in which data is retrieved from an auxiliary source or sources, based on the values of variables in the primary source.
base table
The primary source in a horizontal combination. In this chapter, the base table is always a SAS data set.
lookup table or tables
Any input data source, except the base table, that is used in a horizontal combination.
lookup values or return value
The data value or values that are retrieved from the lookup table or tables during a horizontal combination.
key variable or variables
One or more variables that reside in both the base table and the lookup table. Usually, key values are unique in the lookup table but are not necessarily unique in the base table.
key value or values
A lookup is successful when the key value in the base table finds a matching key value in the lookup table.
Note: The terms combining data horizontally and performing a table lookup are synonymous and are used interchangeably throughout this chapter.
Note: This chapter compares PROC SQL techniques with DATA step techniques. In PROC SQL terms, a SAS data set is usually referred to as a table, a variable is usually referred to as a column, and an observation is usually referred to as a row.
The following figure illustrates a base table and a lookup table that are used in a horizontal combination. The key variable is Num. The key values are listed vertically below Num.
horizontal combination

Relationships between Input Data Sources

One important factor to consider when you perform a table lookup is the relationship between the input data sources. In order to combine data horizontally, you must be able to match observations from each input data source. For example, there might be one or more variables that are common to each input data source. The relationship between input data sources describes how the observations in one source relate to the observations in the other source according to these key values.
The following terms describe the possible relationships between base tables and lookup tables:
  • one-to-one match
  • one-to-many match
  • many-to-many match
  • nonmatching data
In a one-to-one match, key values in both the base table and the lookup table are unique. Therefore, for each observation in the base table, no more than one observation in the lookup table has a matching key value.
one-to-one match
In a one-to-many match, key values in the base table are unique, but key values in the lookup table are not unique. That is, for each observation in the base table, there can be one observation or possibly multiple observations in the lookup table that have a matching key value.
one-to-one match
In a many-to-many match, key values are not unique in the base table or in the lookup table. That is, at least one observation in the base table matches multiple observations in the lookup table, and at least one observation in the lookup table matches multiple observations in the base table.
Many-to-many is a theoretical or mathematical possibility, but it rarely represents a legitimate business application. If a business application appears to be many-to-many, it is usually because a duplicate row was accidentally added to the lookup table.
one-to-one match
Sometimes you will have a one-to-one, a one-to-many, or a many-to-many match that also includes nonmatching data. That is, there are observations in the base table that do not match any observations in the lookup table, or there are observations in the lookup table that do not have matching observations in the base table. If your base table or lookup table or tables include nonmatching data, you will have one of the following:
  • a dense match, in which every or nearly every observation in one table has a matching observation in the other table. In the following figure, the first observation in the base table is unmatched.
    dense match
  • a sparse match, in which there are many nonmatching observations.
    sparse match
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.134.133