Inputting Data Using the DATALINES Statement

Now that you know how to enter your data, you are ready to learn about the SAS statements that actually allow the computer to read the data and put them into a SAS dataset. There are several ways that you can input data, but this book focuses on two: use of the DATALINES statement that allows you to include the data within the SAS program itself; and the INFILE statement that allows you to include the data lines within an external file.

There are also several ways in which data can be read by SAS with regard to the instructions you provide concerning the location and format of your variables. Although SAS allows for list input, column input and formatted input, this text presents only formatted input because of its ability to easily handle many different types of data.

Here is the general form for inputting data using the DATALINES statement and the formatted input style:

DATA dataset-name;
   INPUT  #line-number   @column-number   variable-name  column-width.
                         @column-number   variable-name  column-width.
                         @column-number   variable-name  column-width. ;
DATALINES;
entered data are placed here
;
RUN;

PROC name-of-desired-statistical-procedure     DATA=dataset-name ;
RUN;

The following example shows a SAS program to analyze the preceding dataset. In the following example, the numbers on the far-left side are not actually part of the program. Instead, they are provided so that it will be easy to refer to specific lines of the program when explaining the meaning of the program in subsequent sections.

 1       DATA D1;
 2          INPUT   #1   @1   Q1      1.
 3                       @2   Q2      1.
 4                       @3   Q3      1.
 5                       @4   Q4      1.
 6                       @5   Q5      1.
 7                       @6   Q6      1.
 8                       @7   Q7      1.
 9                       @9   AGE     2.
10                       @12  IQ      3.
11                       @16  NUMBER  2.  ;
12       DATALINES;
13       2234243 22  98  1
14       3424325 20 105  2
15       3242424 32  90  3
16       3242323  9 119  4
17       3232143  8 101  5
18       3242242 24 104  6
19       4343525 16 110  7
20       3232324 12  95  8
21       1322424 41  85  9
22       5433224 19 107 10
23       ;
24       RUN;
25
26       PROC MEANS   DATA=D1;
27       RUN;

A few important notes about these data input statements:

  • The DATA statement. Line 1 from the preceding program includes the DATA statement, where the general form is:

    DATA dataset-name;

    In this case, you gave your dataset the name D1, so the statement reads

    DATA D1;
  • Dataset names and variable names. The preceding paragraph stated that your dataset was assigned the name D1 on line 1 of the program. In lines 2 to 11 of the program, the dataset’s variables are assigned names such as Q1, Q2, AGE, and IQ.

    You are free to assign a dataset or variable any name you like so long as it conforms to the following rules:

    • It must begin with a letter (rather than a number).

    • It contains no special characters such as “*” or “#”.

    • It contains no blank spaces.

    Although the preceding dataset is named D1, it could have been given any of an almost infinite number of other names. Below are examples of other acceptable names for SAS datasets:

    SURVEY
    PARTICIPANT
    RESEARCH
    VOLUNTEER
  • The INPUT statement. The INPUT statement has the following general form:

    INPUT  #line-number  @column-number  variable-name  column-width.
                         @column-number  variable-name  column-width.
                         @column-number  variable-name  column-width. ;

    Compare this general form to the actual INPUT statement that appears on lines 2 to 11 of the preceding SAS program, and note the values that were filled in to read your data. In the actual program, the word INPUT appears on line 2 and tells SAS that the INPUT statement has begun. SAS assumes that all of the instructions that follow are data input directions until it encounters a semicolon (;). At that semicolon, the INPUT statement ends. In this example, the semicolon appears at the end of line 11.

  • Line number controls. To the right of the word INPUT is the following:

    #line-number

    This tells SAS what line it should read from in order to find specific variables. In some cases, there can be two or more lines of data for each participant. There is more information on this type of situation in a later section. For the present example, however, the situation is fairly simple: there is only one line of data for each participant so your program includes the following line number control (from line 2 of the program example):

    INPUT   #1

    Technically, it is not necessary to include line number controls when there is only one line of data for each participant (as in the present case). In this text, however, line number controls appear for the sake of consistency.

  • Column location, variable name, and column width directions. To the right of the line number directions, you place the column location, variable name, and column width directions. The syntax for this is as follows:

    @column-number   variable-name   column-width.

    Where column-number appears above, you enter the number of the column in which a specific variable appears. If the variable occupies more than one column (such as IQ in columns 12, 13, and 14), you should enter the number of the column in which it begins (e.g., column 12). Where variable-name appears, you enter the name that you have given to that variable. And where column width appears, you enter how many columns are occupied by that variable. In the case of the preceding data, the first variable is Q1, which appears in column 1 and is only one column wide. This program example, therefore, provides the following column location controls (from line 2):

    @1   Q1   1.

    The preceding line tells SAS to go to column 1. In that column, you find a variable called Q1. It is a number and it is one column wide.

    You must follow the column width with a period. For column 1, the width is 1. It is important that you include this period; later, you will learn how the period provides information about decimal places.

    Now that variable Q1 has been read, you must give SAS the directions required to read the remaining variables in the dataset. The completed INPUT statement appears as follows. Note that the line number controls are given only once because all of these variables come from the same line (for a given participant). However, there are different column controls for the different variables. Note also how column widths are different for AGE, IQ, and NUMBER:

    INPUT   #1   @1   Q1      1.
                 @2   Q2      1.
                 @3   Q3      1.
                 @4   Q4      1.
                 @5   Q5      1.
                 @6   Q6      1.
                 @7   Q7      1.
                 @9   AGE     2.
                 @12  IQ      3.
                 @16  NUMBER  2.  ;

    Notice the semicolon that appears after the column width entry for the last variable (NUMBER). You must always end your input statement with a semicolon. It is easy to omit the semicolon, so always check for this semicolon if you get an error message following the INPUT statement. (More is said about error statements in later chapters.)

  • The DATALINES statement. The DATALINES statement goes after the INPUT statement and tells SAS that raw data are to follow. Don’t forget the semicolon after the word DATALINES. In the preceding program example, the DATALINES statement appears on line 12.

  • The data lines. The data lines, of course, are the lines that contain the participants’ values for the numeric and/or character variables. In the preceding program example, these appear on lines 13 to 22.

    The data lines should begin on the very next line after the DATALINES statement; there should be no blank lines. These data lines begin on line 13 in the preceding program example. On the very first line after the last of the data lines (line 23, in this case), you should add another semicolon to let SAS know that the data have ended. Do not place this semicolon at the end of the last line of data (i.e., on the same line as the data) as this might cause an error. After this semicolon, a RUN statement should appear at the end of the data lines. In the preceding program example, this statement appears on line 24.

    With respect to the data lines, the most important thing to remember is that you must enter a given variable in the column specified by the INPUT statement. For example, if your input statement contains the following line:

    @9   AGE   2.

    then make sure that the variable AGE really is a two-digit number found in columns 9 and 10.

  • PROC and RUN statements. There is little to say about PROC and RUN statements at this point because most of the remaining text is concerned with using such SAS procedures. Suffice to say that a PROC (procedure) statement asks SAS to perform some statistical analysis. To keep things simple, this section uses a procedure called PROC MEANS. PROC MEANS asks SAS to calculate means, standard deviations, and other descriptive statistics for numeric variables. The preceding program includes the PROC MEANS statement on line 26.

    In most cases, your program ends with a RUN statement. In the preceding program example, a second RUN statement appears on line 27. A RUN statement executes any previously entered SAS statements; RUN statements are typically placed after every PROC statement. If your program includes a number of PROC statements in sequence, it is acceptable to place just one RUN statement after the final PROC statement.

    If you submitted the preceding program for analysis, PROC MEANS would produce the results presented in Output 3.1:

    Output 3.1. Results of the MEANS Procedure
    The MEANS Procedure
    
             Variable     N            Mean         Std Dev         Minimum         Maximum
             ------------------------------------------------------------------------------
             Q1          10       3.0000000       1.0540926       1.0000000       5.0000000
             Q2          10       2.6000000       0.8432740       2.0000000       4.0000000
             Q3          10       3.2000000       0.7888106       2.0000000       4.0000000
             Q4          10       2.6000000       0.8432740       2.0000000       4.0000000
             Q5          10       2.9000000       1.1972190       1.0000000       5.0000000
             Q6          10       2.6000000       0.9660918       2.0000000       4.0000000
             Q7          10       3.7000000       0.9486833       2.0000000       5.0000000
             AGE         10      20.3000000      10.2745641       8.0000000      41.0000000
             IQ          10     101.4000000       9.9241568      85.0000000     119.0000000
             NUMBER      10       5.5000000       3.0276504       1.0000000      10.0000000
             ------------------------------------------------------------------------------

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.20.221.109