Manipulating Data

Selected Useful Statements

Here are examples of statements that accomplish specific data-manipulation tasks.
Table 12.1 Manipulating Data Using the DATA Step
Task
Example Code
Subset data
if resthr<70 then delete;  
if tolerance='D';
Drop unwanted variables
drop timemin timesec;
Create or modify a variable
TotalTime=(timemin*60)+timesec;
Initialize and retain a variable
Accumulate values
retain SumSec 5400; 
sumsec+totaltime;
Specify a variable's length
length TestLength $ 6;
Execute statements conditionally
if totaltime>800 then TestLength='Long'; 
else if 750<=totaltime<=800 
     then TestLength='Normal'; 
else if totaltime<750 
     then TestLength='Short';
Label a variableFormat a variable
label sumsec='Cumulative Total Seconds'; 
format sumsec comma6.;

Example: Manipulating Data

The following DATA step reads the data set clinic.cltrials, selects observations and variables, and creates new variables.
data research.drug1h(drop=placebo uric); 
   set clinic.cltrials(drop=triglyc); 
      if sex='M' then delete; 
      if placebo='YES'; 
      retain TestDate'22MAY2000'd; 
      retain Days 30; 
      days+1; 
      length Retest $ 5; 
      if cholesterol>190 then retest='YES'; 
        else if 150<=cholesterol<=190 then retest='CHECK'; 
        else if cholesterol<150 then retest='NO'; 
   label retest='Perform Cholesterol Test 2?'; 
run;
To view the HTML output of the research.drug1h, use the PROC PRINT procedure.
proc print data=research.drug1h;
run;
Figure 12.2 PROC PRINT Output: research.drug1h
The DATA step reads the data set clinic.cltrials, selects observations and variables, and creates the data set research.drug1h containing two observations and six variables.

Where to Specify the DROP= and KEEP= Data Set Options

Recall that you can specify the DROP= and KEEP= data set options anywhere you name a SAS data set. You can specify DROP= and KEEP= in either the DATA statement or the SET statement, depending on whether you want to drop variables from either the output or the source data set:
  • If you never reference certain variables and you do not want them to appear in the new data set, use a DROP= option in the SET statement.
    In the DATA step shown below, the DROP= or KEEP= option in the SET statement prevents the variables Triglycerides and UricAcid from being read. These variables do not appear in the Lab23.Drug1H data set.
    data research.drug1h(drop=placebo); 
       set clinic.cltrials(drop=triglycerides uricacid);
       if placebo='YES'; 
    run;
  • If you do need to reference a variable in the original data set (in a subsetting IF statement, for example), you can specify the variable in the DROP= or KEEP= option in the DATA statement. Otherwise, the statement that references the variable uses a missing value for that variable.
    This DATA step uses the variable Placebo to select observations. To drop Placebo from the new data set, the DROP= option must appear in the DATA statement.
    When used in the DATA statement, the DROP= option simply drops the variables from the new data set. However, they are still read from the original data set and are available within the DATA step.
Last updated: January 10, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.70.185