Introduction: Manipulating, Subsetting, Concatenating, and Merging Data

Very often, researchers obtain a dataset in which the data are not yet in a form appropriate for analysis. For example, imagine that you are conducting research on job satisfaction. Perhaps you want to compute the correlation between participant age and a single index of job satisfaction. You administer a 10-item questionnaire to 200 employees to assess job satisfaction, and you enter their responses to the 10 individual questionnaire items.

You now need to add together each participant’s response to those 10 items to arrive at a single composite score that reflects that participant’s overall level of satisfaction. This computation is easy to perform by including a number of data-manipulation statements in the SAS program. Data-manipulation statements are SAS statements that transform the dataset in some way. They can be used to recode negatively keyed variables, create new variables from existing variables, and perform a wide range of other tasks.

At the same time, your original dataset might contain observations that you do not wish to include in your analyses. Perhaps you administered the questionnaire to hourly as well as salaried employees, and you want to analyze only data from the former. In addition, you might want to analyze data only from participants who have usable data on all of the study’s variables. In these situations, you can include data-subsetting statements to eliminate unwanted responses from the sample. Data-subsetting statements are SAS statements that eliminate unwanted observations from a sample so that only a specified subgroup is included in the resulting dataset.

In other situations, it might be necessary to concatenate or merge datasets before you can perform the analyses you desire. When you concatenate datasets, you combine two previously existing datasets that contain data on the same variables but from different participants. The resulting concatenated dataset contains aggregate data from all participants. In contrast, when you merge datasets, you combine two datasets that involve the same participants but contain different variables. For example, assume that dataset D1 contains variables V1 and V2, while dataset D2 contains variables V3 and V4. Further assume that both datasets have a variable called ID (identification number) that is used to merge data from the same participants. Once D1 and D2 are merged, the resulting dataset (D3) contains V1, V2, V3, and V4 as well as ID.

The SAS programming language is so comprehensive and flexible that it can perform virtually any type of manipulation, subsetting, concatenating, or merging task. A complete treatment of these capabilities would easily fill a book. However, this chapter reviews some basic statements that can be used to solve a wide variety of problems that are commonly encountered in social science research (particularly in research that involves the analysis of questionnaire data).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.54.63