Reading Nonstandard Data with List Input

The Basics of Modifying List Input

You can make list input more versatile by using modified list input. There are two modifiers that can be used with list input.
  • The ampersand (&) modifier is used to read character values that contain embedded blanks.
  • The colon (:) modifier is used to read nonstandard data values and character values that are longer than eight characters, but which contain no embedded blanks.
You can use modified list input to read the file shown below. This file contains the names of the 10 largest U.S. cities ranked in order based on their 2016 estimated population figures.
Notice that some of the values for city names contain embedded blanks and are followed by two blanks. Also, note that the values representing the population of each city are nonstandard numeric values (they contain commas).
Figure 18.24 Raw Data File Topten
Raw data that shows some of the values for city names containing embedded blanks and followed by two blanks, and shows values that contain nonstandard numeric values.

Reading Values That Contain Embedded Blanks

The ampersand (&) modifier enables you to read character values that contain single embedded blanks. The & indicates that a character value that is read with list input might contain one or more single embedded blanks. The value is read until two or more consecutive blanks are encountered. The & modifier precedes a specified informat if one is used.
input Rank City &;

Using the & Modifier with a LENGTH Statement

As shown below, you can use a LENGTH statement to define the length of City , and then add an & modifier to the INPUT statement to indicate that the values contain embedded blanks.
data sasuser.cityrank; 
   infile topten; 
   length City $ 12; 
   input Rank city &;
Figure 18.25 Raw Data File Topten
Raw data that shows values that contain embedded blanks.

Using the & Modifier with an Informat

You can also read the values for City with the & modifier followed by the $w. informat, which reads standard character values, as shown below. When you do this, the w value in the informat determines the variable's length and should be large enough to accommodate the longest value.
Note: SAS reads until it encounters two consecutive blanks, the defined length of the variable, or the end of the input line, whichever comes first.
Figure 18.26 Raw Data File Topten
Raw data that shows standard character values.
Tip
Use two consecutive blanks as delimiters when you use the & modifier. You cannot use any other delimiter to indicate the end of each field.

Reading Nonstandard Values

The colon (:) modifier enables you to read nonstandard data values and character values that are longer than eight characters, but which contain no embedded blanks. The : indicates that values are read until a blank (or other delimiter) is encountered, and then an informat is applied. If an informat for reading character values is specified, the w value specifies the variable's length, overriding the default length.
Notice the values representing the 2016 population of each city in the raw data file below. Because they contain commas, these values are nonstandard numeric values.
Figure 18.27 Raw Data File Topten
Raw data that shows values that contain nonstandard numeric value.
In order to read these values, you can modify list input with the colon (:) modifier, followed by the COMMAw.d informat, as shown in the program below. Notice that the COMMAw.d informat does not specify a w value.
data sasuser.cityrank; 
   infile topten; 
   input Rank City & $12. 
         Pop86 : comma.;
Remember that list input reads each value until the next blank is detected. The default length of numeric variables is 8, so you do not need to specify a w value to indicate the length of a numeric variable.
This is different from using a numeric informat with formatted input. In that case, you must specify a w value in order to indicate the number of columns to be read.

Processing the DATA Step

At compile time, the informat $12. in the example below sets the length of City to 12 and stores this information in the descriptor portion of the data set. During the execution phase, however, the w value of 12 does not determine the number of columns that are read. This is different from the function of informats in the formatted input style.
data sasuser.cityrank; 
   infile topten; 
   input Rank City & $12. 
         Pop86 : comma.; 
run;
Figure 18.28 Reading Raw Data File with Character Values That Are Longer Than 8
Raw data that shows character values with a length greater than 8.
The & modifier indicates that the values for City should be read until two consecutive blanks are encountered. Therefore, the value NEW YORK is read from column 4 to column 11, a total of only 8 columns. When blanks are encountered in both columns 12 and 13, the value NEW YORK is written to the program data vector.
data sasuser.cityrank; 
   infile topten; 
   input Rank City & $12. 
         Pop86 : comma.; 
run;
Figure 18.29 Reading City Value from Raw Data File
Raw data that shows the pointer at the end of New York, and output that shows the value New York written to the program data vector.
The input pointer moves forward to the next nonblank column, which is column 14 in the first record. Now the values for Pop86 are read from column 14 until the next blank is encountered. The COMMAw.d informat removes the commas, and the value is written to the program data vector.
data sasuser.cityrank; 
   infile topten input Rank City & $12. 
         Pop86 : comma.; 
run;
Figure 18.30 Reading Raw Data File POP16 Value
Raw data that shows the pointer at the next nonblank column, and output that shows the value New York and the population value written to the program data vector.
Notice that the character values for City and Pop16 are stored correctly in the data set.
Figure 18.31 SAS Data Set Cityrank
Raw data that shows incorrectly stored values.

Comparing Formatted Input and Modified List Input

As you have seen, informats work differently in modified list input than they do in formatted input. With formatted input, the informat determines both the length of character variables and the number of columns that are read. The same number of columns are read from each record.
input @3 City $12.;
Figure 18.32 Raw Data Showing That the Same Number of Columns Are Read from Each Record
Raw data that shows the same number of columns are read from each record.
The informat in modified list input determines only the length of the variable, not the number of columns that are read. Here, the raw data values are read until two consecutive blanks are encountered.
input City & $12.;
Figure 18.33 Raw Data Showing That Values Are Read until Two Consecutive Blanks Are Encountered
Raw data that shows the raw data values are read until two consecutive blanks are encountered.
Last updated: January 10, 2018
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.217.253