Example 8.15 Applying the Same Operation to a Group of Variables

Goal

Apply operations to selected numeric and character variables in a data set by using arrays without explicitly listing all the variables' names.

Example Features

Featured StepDATA step
Featured Step Options and StatementsARRAY statement, _NUMERIC_ and _CHARACTER_ lists

Input Data Set

Data set GLUCOSE records several glucose measurements at different times for four patients. The data set has allocated four variables to hold the evaluations of the four glucose results. When this data set is created, the evaluations are not completed and are coded with question marks (?).

                                      GLUCOSE

     patient_ glucose_ eval_ glucose_          glucose_          glucose_
 Obs    id      fast   fast     1hr   eval_1hr    2hr   eval_2hr    4hr   eval_4hr
  1   ARD123     101     ?      135      ?         98      ?          .      ?
  2   DKJ891      75     ?       88      ?        103      ?         79      ?
  3   EWP006       .     ?        .      ?          .      ?          .      ?
  4   TAB234      79     ?       94      ?        126      ?        133      ?

Resulting Data Set

Output 8.15 EVAL_GLUCOSE Data Set

            Example 8.15 EVAL_GLUCOSE Data Set Created with DATA Step

    patient_ glucose_ result_ glucose_ result_ glucose_ result_ glucose_ result_
Obs    id      fast    fast      1hr     1hr      2hr     2hr      4hr     4hr

 1   ARD123     101      P       135      H        98      N         .      -
 2   DKJ891      75      N        88      N       103      P        79      N
 3   EWP006       .      -         .      -         .      -         .      -
 4   TAB234      79      N        94      N       126      H       133      H


Example Overview

This DATA step shows how you can specify lists of variables according to type by using the _NUMERIC_ and _CHARACTER_ keywords.

The DATA step evaluates data set GLUCOSE, which contains four glucose results for four patients. The results are saved in numeric variables. The evaluations are saved in character variables. The ordering of the variables is such that each result is followed by its evaluation. Because of this structure and because an array must have variables of only one type, you cannot specify the array elements by writing beginning and ending variables separated with double dashes (e.g., A- - B). Additionally, the variables that store the results and evaluations do not end in sequential numeric suffixes so you cannot list the elements with the beginning and ending elements separated with a single dash (e.g., X1-X12).

The DATA step defines two arrays: numeric array GLUCOSE with the results and character array EVALS with the evaluations. The only numeric variables in the data set are the results so the list in the ARRAY statement can be specified simply with the _NUMERIC_ keyword. Variable PATIENT_ID is a character variable so you cannot specify the list of array elements for EVALS with just the _CHARACTER_ keyword. Instead you can specify the beginning and ending elements of the array and separate them with the _CHARACTER_ keyword. This structure indicates that the elements of the array are all the character variables inclusively between the beginning and ending elements.

You must know the order in which your variables are saved in the Program Data Vector (PDV) when specifying a list of variables that includes either the _NUMERIC_ or _CHARACTER_ keyword. You must also know the characteristics of all the numeric or character variables if you use just the _NUMERIC_ or _CHARACTER_ keyword. Without understanding the structure and contents of your data set, you could inadvertently modify variable values when you use these keywords. Run PROC CONTENTS or PROC DATASETS to examine your data set.

See also the "A Closer Look" section "Specifying Variable Lists" in Example 9.3 for more applications of the _CHARACTER_ and _NUMERIC_ keywords.

Program

Create data set EVAL_GLUCOSE. Read the observations from data set GLUCOSE. Define the array of numeric variables. Because the only numeric variables in GLUCOSE are the ones that belong in the ARRAY statement, use the _NUMERIC_ keyword to specify them. Define the array of character variables. Because PATIENT_ID is a character variable, do not write the list as "_CHARACTER_". Instead specify the list of EVALS elements with the beginning and ending evaluation variables separated with the CHARACTER keyword.

Assign an evaluation code to each glucose result.

data eval_glucose;
  set glucose;

  array glucose{*} _numeric_;



  array evals{*} eval_fast-character-eval_4hr;






  drop i;
  do i=1 to dim(glucose);
    if glucose{i}=. then eval{i}='-';
    else if glucose{i} lt 100 then eval{i}='N';
    else if 100 le glucose{i} le 125 then eval{i}='P';
    else if glucose{i} gt 125 then eval{i}='H';
  end;
run;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.204.7