Process a specific number of observations at the beginning and end of a data set. Efficiently access the observations by not reading the entire data set. If the total number of observations in the data set is less than the sum of observations to access, process the entire data set.
Featured Step | DATA step |
Featured Step Options and Statements | Direct access, SET statement, NOBS= and POINT= options |
Data set BLOODPRESSURE contains 20 blood pressure measurements.
BLOODPRESSURE Obs bptime systolic diastolic pulse 1 8:20 160 90 99 2 8:22 171 92 103 3 8:24 158 88 102 4 8:30 155 90 93 5 8:43 144 88 90 6 8:51 145 82 88 7 8:59 140 80 86 8 9:02 138 82 84 9 9:06 130 80 78 10 9:09 130 76 75 11 9:13 128 77 78 12 9:18 126 75 73 13 9:25 125 75 72 14 9:31 122 73 74 15 9:42 124 75 70 16 9:45 123 73 68 17 9:50 120 73 67 18 9:52 115 70 67 19 9:55 116 73 66 20 9:59 115 68 65
Output 8.5 FIRSTLAST4BP Data SetExample 8.5 FIRSTLAST4BP Data Set Created with DATA Step Obs measurement bptime systolic diastolic pulse 1 1 8:20 160 90 99 2 2 8:22 171 92 103 3 3 8:24 158 88 102 4 4 8:30 155 90 93 5 17 9:50 120 73 67 6 18 9:52 115 70 67 7 19 9:55 116 73 66 8 20 9:59 115 68 65 |
This example shows you how to write a DATA step that can select a specific number of observations from the beginning and the end of a data set. The DATA step accesses the observations directly by observation number by using the POINT= option in the SET statement; it does not process the data set sequentially.
Data set BLOODPRESSURE contains 20 blood pressure measurements for a person over a period of time.
The following DATA step selects the first four observations and last four observations from BLOODPRESSURE. The NOBS= option in the SET statement assigns to variable TOTALOBS the number of observations in BLOODPRESSURE.
The value of TOTALOBS is assigned during compilation before the DATA step executes. Therefore, the value of TOTALOBS is available at the top of the DATA step when execution commences.
The IF-THEN block executes when there are more observations in BLOODPRESSURE than the sum of the total number of observations to read from the beginning and end of the data set. It contains a DO loop with two sets of ranges of index values: the first specifies the range of observations to access at the beginning of the data set, and the second specifies the range of observations to access at the end of the data set. The DO loop in this example reads observations 1 through 4 and 17 through 20.
If the check on the number of observations did not exist, and the first DO loop executed unconditionally when the sum of TOTAL1 and TOTAL2 was greater than TOTALOBS, your output data set would contain duplicates of some observations because of the overlap of the two ranges of observations. Additionally, the ranges that were specified might attempt to access observations that don't exist, which would result in errors.
The ELSE-DO block executes when the total number of observations in the data set is less than the sum of the number of observations to access from the beginning and the end of the data set. The DO loop in this block simply reads each observation in the data set causing the output data set to have the same number of observations as the input data set.
The DATA step iterates only once. The DO loops iterate the number of times equal to the number of observations they read. Therefore, assignment statements assign the constant values to the variables that specify the DO loop index values. Usually when a DATA step executes for every observation in a data set, it is more efficient to assign the constant values with a RETAIN statement that executes only once.
Create data set FIRSTLAST4BP.
Assign constant values to the two variables that define how many observations to read from the beginning of the data set (TOTAL1) and from the end of the data set (TOTAL2). Execute this block if the total number of observations in BLOODPRESSURE is greater than the sum of the total number of observations to access from BLOODPRESSURE. Specify the lower index value of the range of observations to select from the beginning of the data set. Compute the upper index value of the range of observations to select from the beginning of the data set. Compute the lower index value of the range of observations to select from the end of the data set. Specify the upper index value of the range of observations to select from the end of the data set. Specify a DO loop with two sets of ranges, the first for the observations to access from the beginning of the data set and the second for the observations to access from the end of the data set. Assign the observation number that is currently being accessed to variable MEASUREMENT. With I designated as the POINT= variable, SAS considers it a temporary variable. Therefore, to save its value in the output data set, it must be assigned to a data set variable. Read the ith observation from BLOODPRESSURE. Assign the total number of observations in BLOODPRESSURE to variable TOTALOBS. Output each observation that is accessed directly.
Execute this block when the total number of observations in BLOODPRESSURE is less than the sum of the total number of observations to access from BLOODPRESSURE. Execute this loop once for each observation in BLOODPRESSURE. Assign the observation number that is currently being accessed to variable MEASUREMENT. Read the ith observation from data set BLOODPRESSURE. Assign the total number of observations in BLOODPRESSURE to variable TOTALOBS. Output each observation that is accessed directly, which in this loop is every observation in BLOODPRESSURE.
Use a STOP statement to prevent the DATA step from continuous looping because there is no end-of-file condition when SAS reads data with direct access.
data firstlast4bp; keep measurement bptime systolic diastolic pulse; total1=4; total2=4; if totalobs ge (total1+total2) then do; start1=1; end1=start1+total1-1; start2=totalobs-total2+1; end2=totalobs; do i=start1 to end1,start2 to end2; measurement=i; set bloodpressure point=i nobs=totalobs; output; end; end; else do; do i=1 to totalobs; measurement=i; set bloodpressure point=i nobs=totablobs; output; end; end; stop; run;
3.133.156.251