Three Types of SAS Files

Subsequent chapters of this manual provide details on how to write a SAS program: how to create and manage data; how to request specific statistical procedures; and so forth. This chapter presents a short SAS program and discusses the resulting output. Little elaboration is offered.

The purpose of this chapter is to provide a very general sense of what it entails to submit a SAS program and interpret the results. You are encouraged to copy the program that appears in the following example, submit it for analysis, and verify that the resulting output matches the output reproduced here. This exercise will provide you with the SAS big picture, and this perspective will facilitate learning the programming details presented in subsequent chapters.

You work with three types of “files” when using SAS: one file contains the SAS program; one contains the SAS log; and one contains the SAS output. The following sections discuss the differences among these files.

The SAS Program

A SAS program consists of a set of statements written by the researcher or programmer. These statements provide SAS with the data to be analyzed, tell it about the nature of these data, and indicate which statistical analyses should be performed on the data.

This section illustrates a simple SAS program by analyzing some fictitious data from a fictitious study. Assume that six high school students have taken the Graduate Record Examinations (GRE). This test provides two scores for each student: a score on the GRE verbal test; and a score on the GRE math test. With both tests, scores can range from 200 to 800 with higher scores indicating higher achievement levels.

Assume that you now want to obtain simple descriptive statistics regarding six students’ scores on these two tests. For example, what is their average score on the GRE verbal test or on the GRE math test? What is the standard deviation of the scores on the two tests?

To perform these analyses, you prepare the following SAS program:

The preceding code shows that a SAS program consists of two parts: a DATA step which is used to read data and create a SAS dataset; and a PROC step which is used to process or analyze the data. The differences between these steps are described in the next two sections.

The DATA Step

In the DATA step, programming statements create and/or modify a SAS dataset. Among other things, these statements can:

  • provide a name for the dataset;

  • provide a name for the variables to be included in the dataset;

  • provide the actual data to be analyzed.

In the preceding program, the DATA step begins with the DATA statement and ends with the semicolon and RUN statement. The RUN statement immediately precedes the PROC MEANS statement.

The first statement of the preceding program begins with the word DATA and specifies that SAS should create a dataset to be called D1. The next line contains the INPUT statement, which indicates that three variables will be contained in this dataset. The first variable will be called PARTICIPANT, and this variable will simply provide a participant number for each student. The second variable will be called GREVERBAL (for the GRE verbal test), and the third will be called GREMATH (for the GRE math test).

The DATALINES statement indicates that data lines containing your data will appear on the following lines. The first line after the DATALINES statement contains the data (test scores) for participant 1. You can see that this first data line contains the numbers 520 and 490 meaning that participant 1 received a score of 520 on the GRE verbal test and a score of 490 on the GRE math test. The next data line shows that participant 2 received a score of 610 for the GRE verbal and a score of 590 for the GRE math. The semicolon and RUN statement after the last data line signal the end of the data.

The PROC Step

In contrast to the DATA step, the PROC step includes programming statements that request specific statistical analyses of the data. For example, the PROC step might request that correlations be performed between all quantitative variables or might request that a t test be performed. (For more information, see Chapter 8, “t Tests: Independent Samples and Paired Samples.) In the preceding example, the PROC step consists of the last three lines of the program.

The first line after the DATA step is the PROC MEANS statement. This requests that SAS use a procedure called MEANS to analyze the data. The MEANS procedure computes means, standard deviations, and other descriptive statistics for numeric variables in the dataset. Immediately after the words PROC MEANS are the words DATA=D1. This tells the system that the data to be analyzed are in a dataset named D1. (Remember that D1 is the name of the dataset just created.)

Following the PROC MEANS statement is the VAR statement, which includes the names of two variables: GREVERBAL and GREMATH. This requests that the descriptive statistics be performed on GREVERBAL (GRE verbal test scores) and GREMATH (GRE math test scores).

Finally, the last line of the program is the RUN statement that signals the end of the PROC step. If a SAS program requests multiple PROCs (procedures), you have two options for using the RUN statement:

  • You can place a separate RUN statement following each PROC statement.

  • You can place a single RUN statement following the last PROC statement.

What is the single most common programming error?

For new SAS users, the single most common error involves leaving off a required semicolon. Remember that every SAS statement must end with a semicolon. In the preceding program, notice that the DATA statement ends with a semicolon as does the INPUT statement, the DATALINES statement, the PROC MEANS statement, and the RUN statement. When you obtain an error in running a SAS program, one of the first things you should do is look over the program for missing semicolons.


Once you submit the preceding program for analysis, SAS creates two types of files reporting the results of the analysis. One file is called the SAS log or log file in this text. This file contains notes, warnings, error messages, and other information related to the execution of the SAS program. The other file is referred to as the SAS output file. The SAS output file contains the results of the requested statistical analyses.

The SAS Log

The SAS log is a listing of notes and messages that help you verify that your SAS program was executed successfully. Specifically, the log provides the following:

  • a reprinting of the SAS program that was submitted;

  • a listing of notes indicating how many variables and observations are contained in the dataset;

  • a listing of any errors made in the execution of the SAS program.

Log 2.1 provides a reproduction of the SAS log for the preceding program:

Log 2.1. SAS Log for the Preceding Program
NOTE: Copyright (c) 2002-2003 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) 9.1 (TS1M0)
      Licensed to NORM O'ROURKE, Site 0042223001.
NOTE: This session is executing on the WIN_PRO  platform.


NOTE: SAS initialization used:
      real time           23.83 seconds
      cpu time            4.47 seconds

1    DATA D1;
2    INPUT PARTICIPANT GREVERBAL GREMATH;
3
4    DATALINES;

NOTE: The data set WORK.D1 has 6 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           1.39 seconds
      cpu time            0.27 seconds


11   ;
12   RUN;
13   PROC MEANS  DATA=D1;
14      VAR GREVERBAL GREMATH;
15   RUN;

NOTE: There were 6 observations read from the data set WORK.D1.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           1.09 seconds
      cpu time            0.25 seconds

Notice that the statements constituting the SAS program are assigned line numbers and are reproduced in the SAS log. The data lines are not normally reproduced as part of the SAS log unless they are specifically requested.

About halfway down the log, a note indicates that the dataset contains six observations and three variables. You would check this note to verify that the dataset contains all of the variables that you intended to input (in this case, three) and that it contains data from all of your participants (in this case, six). So far, everything appears to be correct.

If you made any errors in writing the SAS program, there would also have been ERROR messages in the SAS log. Often, these error messages help you determine what was wrong with the program. For example, a message might indicate that SAS was expecting a program statement that was not included. Whenever you encounter an error message, read it carefully and review all of the program statements that preceded it. Often, the error appears in the program statements that immediately precede the error message; in other cases, the error might be hidden much earlier in the program.

If more than one error message is listed, do not panic; there still might be only one error. Sometimes, a single error causes a large number of subsequent error messages.

Once the error or errors are identified, you must revise the original SAS program and resubmit it for analysis. Review the new SAS log to see if the errors have been eliminated. If the log indicates that the program ran correctly, you are free to review the results of the analyses in the SAS output file.

The SAS Output File

The SAS output file contains the results of the statistical analyses requested in the SAS program. Because the program in the previous example requested the MEANS procedure, the corresponding output file contains means and other descriptive statistics for the variables analyzed. In this text, the SAS output file is sometimes referred to as the lst file. “Lst” is an abbreviation for “listing of results.”

The following is a reproduction of the SAS output file that would be produced from the preceding SAS program:

Output 2.1. Results of the MEANS Procedure
                                  The SAS System

                               The MEANS Procedure

 Variable     N            Mean         Std Dev         Minimum          Maximum
 -------------------------------------------------------------------------------
 GREVERBAL    6     516.6666667      72.5718035     410.0000000      610.0000000
 GREMATH      6     455.0000000      83.3666600     350.0000000      590.0000000
 -------------------------------------------------------------------------------

Below the heading “Variable,” SAS prints the names of each of the variables being analyzed. In this case, the variables are called GREVERBAL and GREMATH. To the right of the heading GREVERBAL, descriptive statistics for the GRE verbal test may be found. Figures for the GRE math test appear to the right of GREMATH.

Below the heading “N,” the number of observations analyzed is reported. The average score on each variable is reproduced under “Mean” and standard deviations appear in the column “Std Dev.” Minimum and maximum scores for the two variables appear in the remaining two columns. You can see that the mean score on the GRE verbal test was 516.67, and the standard deviation of these scores was 72.57. For the GRE math test, the mean was 455.00, and the standard deviation was 83.37.

SAS Output generally reports findings to several decimal places. In the body of this text, however, numbers will be reported to only two decimal places—rounded where necessary—in keeping with the publication manual of the American Psychological Association (APA, 2001).


The statistics included in the preceding output are printed by default (i.e., without asking for them specifically). Later in this text, you will learn that there are many additional statistics that you can request as options with PROC MEANS.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.18.4