How to Prepare Your Data Sets

Determining the Structure and Contents of Data Sets

Typically, data comes from multiple sources and might be in different formats. Many applications require input data to be in a specific format before the data can be processed. Although application requirements vary, there are common factors for all applications that access, combine, and process data. You can identify these common factors for your data. Here are tasks to help you start:
  • Determine how the input data is related.
  • Ensure that the data is properly sorted or indexed, if necessary.
  • Select the appropriate access method to process the input data.
  • Select the appropriate SAS tools to complete the task.
You can use the CONTENTS, DATASETS, and PRINT procedures to review the structure of your data.
Relationships among multiple sources of input data exist when each of the sources contains common data, either at the physical or logical level. For example, employee data and department data could be related through an employee ID variable that shares common values. Another data set could contain numeric sequence numbers whose partial values logically relate it to a separate data set by observation number.
You must be able to identify the existing relationships in your data. This knowledge is crucial for understanding how to process input data in order to produce desired results. All related data falls into one of these four categories, characterized by how observations relate among the data sets:
  • one-to-one
  • one-to-many
  • many-to-one
  • many-to-many
Finally, to obtain the desired results, you should understand how each of these methods combines observations and how each treats duplicate, missing, or unmatched values of common variables. Some of the methods require that you preprocess your data sets by sorting or creating indexes. Testing is a good first step.

Testing Your Program

Create small temporary data sets that contain a sample of rows that test all of your program's logic. If your logic is faulty and you get unexpected output, you can debug your program.

Looking at Sources of Common Problems

If your program does not run correctly, review your input data for the following errors:
  • columns that have the same name but that represent different data
    To correct the error, you can rename columns before you combine the data sets by using the RENAME= table option in the SET or MERGE statement. As an alternative, use the DATASETS procedure to display all library management functions for all member types (except catalogs).
  • common columns that have the same data but different attributes
Last updated: August 23, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.192.183