Example 8.5 Accessing a Specific Number of Observations from the Beginning and End of a Data Set

Goal

Process a specific number of observations at the beginning and end of a data set. Efficiently access the observations by not reading the entire data set. If the total number of observations in the data set is less than the sum of observations to access, process the entire data set.

Example Features

Featured StepDATA step
Featured Step Options and StatementsDirect access, SET statement, NOBS= and POINT= options

Input Data Set

Data set BLOODPRESSURE contains 20 blood pressure measurements.

                         BLOODPRESSURE

        Obs    bptime    systolic    diastolic    pulse
          1     8:20        160          90         99
          2     8:22        171          92        103
          3     8:24        158          88        102
          4     8:30        155          90         93
          5     8:43        144          88         90
          6     8:51        145          82         88
          7     8:59        140          80         86
          8     9:02        138          82         84
          9     9:06        130          80         78
         10     9:09        130          76         75
         11     9:13        128          77         78
         12     9:18        126          75         73
         13     9:25        125          75         72
         14     9:31        122          73         74
         15     9:42        124          75         70
         16     9:45        123          73         68
         17     9:50        120          73         67
         18     9:52        115          70         67
         19     9:55        116          73         66
         20     9:59        115          68         65

Resulting Data Set

Output 8.5 FIRSTLAST4BP Data Set

    Example 8.5 FIRSTLAST4BP Data Set Created with DATA Step

 Obs    measurement    bptime    systolic    diastolic    pulse
  1           1         8:20        160          90         99
  2           2         8:22        171          92        103
  3           3         8:24        158          88        102
  4           4         8:30        155          90         93
  5          17         9:50        120          73         67
  6          18         9:52        115          70         67
  7          19         9:55        116          73         66
  8          20         9:59        115          68         65


Example Overview

This example shows you how to write a DATA step that can select a specific number of observations from the beginning and the end of a data set. The DATA step accesses the observations directly by observation number by using the POINT= option in the SET statement; it does not process the data set sequentially.

Data set BLOODPRESSURE contains 20 blood pressure measurements for a person over a period of time.

The following DATA step selects the first four observations and last four observations from BLOODPRESSURE. The NOBS= option in the SET statement assigns to variable TOTALOBS the number of observations in BLOODPRESSURE.

The value of TOTALOBS is assigned during compilation before the DATA step executes. Therefore, the value of TOTALOBS is available at the top of the DATA step when execution commences.

The IF-THEN block executes when there are more observations in BLOODPRESSURE than the sum of the total number of observations to read from the beginning and end of the data set. It contains a DO loop with two sets of ranges of index values: the first specifies the range of observations to access at the beginning of the data set, and the second specifies the range of observations to access at the end of the data set. The DO loop in this example reads observations 1 through 4 and 17 through 20.

If the check on the number of observations did not exist, and the first DO loop executed unconditionally when the sum of TOTAL1 and TOTAL2 was greater than TOTALOBS, your output data set would contain duplicates of some observations because of the overlap of the two ranges of observations. Additionally, the ranges that were specified might attempt to access observations that don't exist, which would result in errors.

The ELSE-DO block executes when the total number of observations in the data set is less than the sum of the number of observations to access from the beginning and the end of the data set. The DO loop in this block simply reads each observation in the data set causing the output data set to have the same number of observations as the input data set.

The DATA step iterates only once. The DO loops iterate the number of times equal to the number of observations they read. Therefore, assignment statements assign the constant values to the variables that specify the DO loop index values. Usually when a DATA step executes for every observation in a data set, it is more efficient to assign the constant values with a RETAIN statement that executes only once.

Program

Create data set FIRSTLAST4BP.

Assign constant values to the two variables that define how many observations to read from the beginning of the data set (TOTAL1) and from the end of the data set (TOTAL2). Execute this block if the total number of observations in BLOODPRESSURE is greater than the sum of the total number of observations to access from BLOODPRESSURE. Specify the lower index value of the range of observations to select from the beginning of the data set. Compute the upper index value of the range of observations to select from the beginning of the data set. Compute the lower index value of the range of observations to select from the end of the data set. Specify the upper index value of the range of observations to select from the end of the data set. Specify a DO loop with two sets of ranges, the first for the observations to access from the beginning of the data set and the second for the observations to access from the end of the data set. Assign the observation number that is currently being accessed to variable MEASUREMENT. With I designated as the POINT= variable, SAS considers it a temporary variable. Therefore, to save its value in the output data set, it must be assigned to a data set variable. Read the ith observation from BLOODPRESSURE. Assign the total number of observations in BLOODPRESSURE to variable TOTALOBS. Output each observation that is accessed directly.

Execute this block when the total number of observations in BLOODPRESSURE is less than the sum of the total number of observations to access from BLOODPRESSURE. Execute this loop once for each observation in BLOODPRESSURE. Assign the observation number that is currently being accessed to variable MEASUREMENT. Read the ith observation from data set BLOODPRESSURE. Assign the total number of observations in BLOODPRESSURE to variable TOTALOBS. Output each observation that is accessed directly, which in this loop is every observation in BLOODPRESSURE.

Use a STOP statement to prevent the DATA step from continuous looping because there is no end-of-file condition when SAS reads data with direct access.

data firstlast4bp;
  keep measurement bptime systolic diastolic pulse;
  total1=4;
  total2=4;



  if totalobs ge (total1+total2) then do;



    start1=1;


    end1=start1+total1-1;

    start2=totalobs-total2+1;


    end2=totalobs;


    do i=start1 to end1,start2 to end2;

       measurement=i;





       set bloodpressure point=i

          nobs=totalobs;

       output;

    end;
  end;
  else do;



    do i=1 to totalobs;

      measurement=i;

      set bloodpressure point=i

          nobs=totablobs;

      output;

    end;
  end;
  stop;



run;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.156.251