DATA Step Statements for Reading Data

Naming the Data Set with the DATA Statement

The DATA statement indicates the beginning of the DATA step and names the SAS data set to be created.
Syntax, DATA statement:
DATA SAS-data-set-1 <...SAS-data-set-n>;
SAS-data-set names (in the format libref.filename) the data set or data sets to be created.
Remember that a permanent SAS data set name is a two-level name. For example, the two-level name Clinic.Admit specifies that the data set Admit is stored in the permanent SAS library to which the libref Clinic has been assigned.

Specifying the Raw Data File with the INFILE Statement

When reading raw data, use the INFILE statement to indicate which file the data is in.
Syntax, INFILE statement:
INFILE file-specification <options>;
  • file-specification can take the form fileref to name a previously defined file reference or 'filename' to point to the actual name and location of the file.
  • options describes the input file's characteristics and specifies how it is to be read with the INFILE statement.
To read the raw data file to which the fileref Tests has been assigned, you write the following INFILE statement:
infile tests;
Tip
Instead of using a FILENAME statement, you can identify the raw data file by specifying the entire filename and location in the INFILE statement. For example, the following statement points directly to the C:IrsPersonalRefund.dat file:
infile 'c:irspersonal
efund.dat';
Tip
When creating a temporary data set in the Work library, it is permissible to specify only the data set name and omit the Work library name.

Column Input

Using column input is one of the several methods for reading data. Column input specifies actual column locations for values. However, column input is appropriate only in certain situations. When you use column input, your data must meet these conditions:
  • It must contain only standard character or numeric values.
  • It must be arranged in fixed fields.
The following external file contains data that is arranged in columns or fixed fields. You can specify a beginning and ending column for each field.
Figure 6.5 External File with Columns
An external file that contains data that is arranged in columns.

Standard and Nonstandard Numeric Data

Standard numeric data values can contain only the following text and characters:
  • numbers
  • decimal points
  • numbers in scientific or E notation (2.3E4, for example)
  • plus or minus signs
Nonstandard numeric data that includes the following text and characters cannot be ready by column input:
  • values that contain special characters, such as percent signs (%), dollar signs ($), and commas (,)
  • date and time values
  • data in fraction, integer binary, real binary, and hexadecimal forms
The following external file contains the personnel information for a technical writing department of a small computer manufacturer. The fields contain values for each employee's last name, first name, job code, and annual salary.
Notice that the salary values contain commas. So, the salary values are considered to be nonstandard numeric values. You cannot use column input to read these values.
Figure 6.6 Raw Data File
Raw Data File

Describing the Data with the INPUT Statement

The INPUT statement describes the fields of raw data to be read and placed into the SAS data set.
Syntax, INPUT statement using column input:
INPUT variable <$> startcol-endcol . . .;
  • variable is the SAS name that you assign to the field.
  • The dollar sign ($) identifies the variable type as character (if the variable is numeric, then nothing appears here).
  • startcol represents the starting column for this variable.
  • endcol represents the ending column for this variable.
Here is a small data file.
Figure 6.7 Raw Data File
Raw Data File
The INPUT statement below assigns the character variable ID to the data in columns 1-4, the numeric variable Age to the data in columns 6-7, the character variable ActLevel to the data in columns 9-12, and the character variable Sex to the data in column 14.
filename exer 'Z:sasuserexer.dat'; 
data exercise; 
   infile exer; 
   input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14; 
run;
proc print data=exercise;
run;
Figure 6.8 Assigning Column Ranges to Variables
Assigning Column Ranges to Variables
When you use column input, you can do the following:
  • read any or all fields from the raw data file
  • read the fields in any order
  • specify only the starting column for values that occupy only one column
     input ActLevel $ 9-12 Sex $ 14 Age 6-7;
Tip
Remember that when you name a new variable, you must specify the name in the exact case that you want it stored (for example, NewBalance). Thereafter, you can specify the name in uppercase, lowercase, or mixed case letters.

Specifying Variable Names

Each variable has a name that conforms to SAS naming conventions. Variable names follow these rules for naming:
  • They can be 1 to 32 characters long.
  • They must begin with a letter (A-Z, either uppercase or lowercase) or an underscore (_)
  • They can continue with any combination of numbers, letters, or underscores.
Note: It is a best practice to restrict variable names with the global option VALIDVARNAME=v7 and to follow the variable naming rules.
The INPUT statement uses column input to read the three data field in the raw data file below.
Figure 6.9 Raw Data File
Raw Data File
The values for the variable Age are located in columns 1-2. Because Age is a numeric variable, you do not specify a dollar sign ($) after the variable name.
input Age 1-2
The values for the variable ActLevel are located in columns 3-6. You specify a $ to indicate that ActLevel is a character variable.
input Age 1-2 ActLevel $ 3-6
The values for the character variable Sex are located in column 7. Notice that you specify only a single column.
input Age 1-2 ActLevel $ 3-6 Sex $ 7;
Last updated: January 10, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.239.41