Placement of Data Manipulation and Data Subsetting Statements

The use of data manipulation and data subsetting statements are illustrated here with reference to the fictitious study described in the preceding chapter. In that chapter, you were asked to imagine that you had developed a 7-item questionnaire dealing with volunteerism, as shown in the following example.

Volunteerism Survey

Please indicate the extent to which you agree or
 disagree with each
of the following statements.  You will do this by
 circling the
appropriate number to the left of that statement. 
 The following
format shows what each response alternative
 represents:

     5 = Agree Strongly
     4 = Agree Somewhat
     3 = Neither Agree nor Disagree
     2 = Disagree Somewhat
     1 = Disagree Strongly

For example, if you "Disagree Strongly" with the
 first question,
circle the "1" to the left of that statement.  If
 you "Agree
Somewhat," circle the "4," and so on.

-------------
 Circle Your
  Response
-------------
1  2  3  4  5     1.    I feel a personal
 responsibility to help
                        needy people in my community.

1  2  3  4  5     2.    I feel I am personally
 obligated to help
                        homeless families.

1  2  3  4  5     3.    I feel no personal
 responsibility to work
                        with poor people in my
 community.

1  2  3  4  5     4.    Most of the people in my
 community are
                        willing to help the needy.

1  2  3  4  5     5.    A lot of people around
 here are willing to
                        help homeless families.

1  2  3  4  5     6.    The people in my community
 feel no personal
                        responsibility to work
 with poor people.

1  2  3  4  5     7.    Everyone should feel the
 responsibility to
                        perform volunteer work in
 his/her community.


What is your age in years? _______________


Assume that you administer this survey to a number of participants and you also obtain information concerning sex, IQ scores, GRE verbal test scores, and GRE math test scores for each participant. Once the data are entered, you might want to write a SAS program that includes some data-manipulation or data-subsetting statements to transform the raw data. But where within the SAS program should these statements appear?

In general, these statements should appear only within the DATA step. Remember that the DATA step begins with the DATA statement and ends as soon as SAS encounters a procedure. This means that if you prepare the DATA step, end the DATA step with a procedure, and then place some manipulation or subsetting statement immediately after the procedure, you will receive an error.

To avoid this error (and keep things simple), place your data-manipulation and data-subsetting statements in one of two locations within a SAS program:

  • immediately following the INPUT statement;

  • or immediately following the creation of a new dataset.

Immediately Following the INPUT Statement

The first of the two preceding guidelines indicates that the statements may be placed immediately following the INPUT statement. This guideline is illustrated again by referring to the volunteerism study. Assume that you prepare the following SAS program to analyze data obtained in your study. In the following program, lines 11 and 12 indicate where you can place data-manipulation or data-subsetting statements in that program. (To conserve space, only some of the data lines are reproduced in the program.)

 1      DATA D1;
 2         INPUT   #1   @1    Q1-Q7        1.
 3                      @9    AGE          2.
 4                      @12   IQ           3.
 5                      @16   NUMBER       2.
 6                      @19   SEX         $1.
 7                 #2   @1    GREVERBAL    3.
 8                      @5    GREMATH      3.  ;
 9
10
11      place data-manipulation statements and
12      data-subsetting statements here
13
14      DATALINES;
15      2234243 22  98  1 M
16      520 490
17      3424325 20 105  2 M
18      440 410
19        .
20        .
21
22      5433224 19 107 10 F
23      640 590
24      ;
25      RUN;
26
27      PROC MEANS  DATA=D1;
28      RUN;

Immediately after Creating a New Dataset

The second guideline for placement provides another option regarding where you can place data-manipulation or data-subsetting statements; they can also be placed immediately following program statements that create a new dataset. A new dataset can be created at virtually any point in a SAS program (even after procedures are requested).

At times, you might want to create a new dataset so that, initially, it is identical to an existing dataset (perhaps the one created with a preceding INPUT statement). If data-manipulation or data-subsetting statements follow the creation of this new dataset, the new set displays the modifications requested by those statements.

To create a new dataset that is identical to an existing dataset, the general form is

DATA new-dataset-name;
   SET existing-dataset-name;

To create such a dataset, use the following statements:

     DATA D2;
        SET D1;

These lines told SAS to create a new dataset called D2 and to make this new dataset identical to D1. Now that a new set has been created, you can write as many manipulation and subsetting statements as you like. However, once you write a procedure, that effectively ends the DATA step and you cannot write any more manipulation or subsetting statements beyond that point (unless you create another dataset later in the program).

The following is an example of how you might write your program so that the manipulation and subsetting statements follow the creation of the new dataset:

 1      DATA D1;
 2         INPUT   #1   @1    Q1-Q7        1.
 3                      @9    AGE          2.
 4                      @12   IQ           3.
 5                      @16   NUMBER       2.
 6                      @19   SEX         $1.
 7                 #2   @1    GREVERBAL    3.
 8                      @5    GREMATH      3.  ;
 9
10      DATALINES;
11      2234243 22  98  1 M
12      520 490
13      3424325 20 105  2 M
14      440 410
15        .
16        .
17
18      5433224 19 107 10 F
19      640 590
20      ;
21      RUN;
22
23      DATA D2;
24         SET D1;
25
26      place data manipulation statements and
27      data subsetting statements here
28
29      PROC MEANS  DATA=D2;
30      RUN;

SAS creates two datasets according to the preceding program: D1 contains the original data; and D2 is identical to D1 except for modifications requested by the data-manipulation and data-subsetting statements.

Notice that the MEANS procedure in line 29 requests the computation of some simple descriptive statistics. It is clear that these statistics are performed on the data from dataset D2 because DATA=D2 appears in the PROC MEANS statement. If the statement, instead, specified DATA=D1, the analyses would have been performed on the original dataset.

The INFILE Statement versus the DATALINES Statement

The preceding program illustrates the use of the DATALINES statement rather than the INFILE statement. The guidelines regarding the placement of data-modifying statements are the same regardless of which approach is followed. The data-manipulation or data-subsetting statement should either immediately follow the INPUT statement or the creation of a new dataset. When a program is written using the INFILE statement rather than the DATALINES statement, data-manipulation and data-subsetting statements should appear after the INPUT statement but before the first procedure. For example, if your data are entered into an external file called VOLUNTEER.DAT, you can write the following program. (Notice where the manipulation and subsetting statements are placed.)

 1   DATA D1;
 2      INFILE 'A:/VOLUNTEER.DAT';
 3      INPUT   #1   @1    Q1-Q7        1.
 4                   @9    AGE          2.
 5                   @12   IQ           3.
 6                   @16   NUMBER       2.
 7                   @19   SEX         $1.
 8              #2   @1    GREVERBAL    3.
 9                   @5    GREMATH      3.  ;
10
11      place data manipulation statements and
12      data subsetting statements here
13
14      PROC MEANS   DATA=D1;
15      RUN;

In the preceding program, the data-modifying statements again come immediately after the INPUT statement but before the first procedure, consistent with earlier recommendations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.72.133