Apply operations to selected numeric and character variables in a data set by using arrays without explicitly listing all the variables' names.
Featured Step | DATA step |
Featured Step Options and Statements | ARRAY statement, _NUMERIC_ and _CHARACTER_ lists |
Data set GLUCOSE records several glucose measurements at different times for four patients. The data set has allocated four variables to hold the evaluations of the four glucose results. When this data set is created, the evaluations are not completed and are coded with question marks (?).
GLUCOSE patient_ glucose_ eval_ glucose_ glucose_ glucose_ Obs id fast fast 1hr eval_1hr 2hr eval_2hr 4hr eval_4hr 1 ARD123 101 ? 135 ? 98 ? . ? 2 DKJ891 75 ? 88 ? 103 ? 79 ? 3 EWP006 . ? . ? . ? . ? 4 TAB234 79 ? 94 ? 126 ? 133 ?
Output 8.15 EVAL_GLUCOSE Data SetExample 8.15 EVAL_GLUCOSE Data Set Created with DATA Step patient_ glucose_ result_ glucose_ result_ glucose_ result_ glucose_ result_ Obs id fast fast 1hr 1hr 2hr 2hr 4hr 4hr 1 ARD123 101 P 135 H 98 N . - 2 DKJ891 75 N 88 N 103 P 79 N 3 EWP006 . - . - . - . - 4 TAB234 79 N 94 N 126 H 133 H |
This DATA step shows how you can specify lists of variables according to type by using the _NUMERIC_ and _CHARACTER_ keywords.
The DATA step evaluates data set GLUCOSE, which contains four glucose results for four patients. The results are saved in numeric variables. The evaluations are saved in character variables. The ordering of the variables is such that each result is followed by its evaluation. Because of this structure and because an array must have variables of only one type, you cannot specify the array elements by writing beginning and ending variables separated with double dashes (e.g., A- - B). Additionally, the variables that store the results and evaluations do not end in sequential numeric suffixes so you cannot list the elements with the beginning and ending elements separated with a single dash (e.g., X1-X12).
The DATA step defines two arrays: numeric array GLUCOSE with the results and character array EVALS with the evaluations. The only numeric variables in the data set are the results so the list in the ARRAY statement can be specified simply with the _NUMERIC_ keyword. Variable PATIENT_ID is a character variable so you cannot specify the list of array elements for EVALS with just the _CHARACTER_ keyword. Instead you can specify the beginning and ending elements of the array and separate them with the _CHARACTER_ keyword. This structure indicates that the elements of the array are all the character variables inclusively between the beginning and ending elements.
You must know the order in which your variables are saved in the Program Data Vector (PDV) when specifying a list of variables that includes either the _NUMERIC_ or _CHARACTER_ keyword. You must also know the characteristics of all the numeric or character variables if you use just the _NUMERIC_ or _CHARACTER_ keyword. Without understanding the structure and contents of your data set, you could inadvertently modify variable values when you use these keywords. Run PROC CONTENTS or PROC DATASETS to examine your data set.
See also the "A Closer Look" section "Specifying Variable Lists" in Example 9.3 for more applications of the _CHARACTER_ and _NUMERIC_ keywords.
Create data set EVAL_GLUCOSE. Read the observations from data set GLUCOSE. Define the array of numeric variables. Because the only numeric variables in GLUCOSE are the ones that belong in the ARRAY statement, use the _NUMERIC_ keyword to specify them. Define the array of character variables. Because PATIENT_ID is a character variable, do not write the list as "_CHARACTER_". Instead specify the list of EVALS elements with the beginning and ending evaluation variables separated with the CHARACTER keyword.
Assign an evaluation code to each glucose result.
data eval_glucose; set glucose; array glucose{*} _numeric_; array evals{*} eval_fast-character-eval_4hr; drop i; do i=1 to dim(glucose); if glucose{i}=. then eval{i}='-'; else if glucose{i} lt 100 then eval{i}='N'; else if 100 le glucose{i} le 125 then eval{i}='P'; else if glucose{i} gt 125 then eval{i}='H'; end; run;
3.17.204.7