Determine whether a variable is character or numeric to ensure that you have the right type of data for your application.
Featured Step | DATA step |
Featured Step Options and Statements | ATTRN, CLOSE, INDEX, OPEN, UPCASE, VARNAME, and VARTYPE functions CALL SYMPUTX routine Open code macro variables |
Related Technique | PROC SQL, DICTIONARY.COLUMNS table, INTO clause to create a macro variable, INDEX and UPCASE functions |
A Closer Look | Using the SAS File I/O Functions Specifying Variable Lists |
Data set ENTRANCE_EXAM stores exam results for 15 applicants to an academic program. The results are saved in a mix of character and numeric variables.
ENTRANCE_EXAM Obs id examversion examdate essay speech mathematics physics vocabulary logic spatial athletics 1 MXMI V4.1 07/02/2010 Meets Below 19 17 14 Exceeds 14 Below 2 URQU V4.3 07/31/2010 Meets Meets 2 14 2 Meets 14 Meets 3 JOPE V4.2 07/13/2010 Exceeds Exceeds 5 8 4 Below 17 Exceeds 4 TYJP V4.3 07/21/2010 Meets Exceeds 14 6 10 Meets 18 Meets 5 YPDF V4.3 07/30/2010 Exceeds Exceeds 6 2 7 Meets 17 Meets 6 PRGH V4.3 07/01/2010 Below Below 12 7 15 Below 9 Meets 7 AYPX V4.3 07/06/2010 Below Below 20 5 6 Meets 12 Exceeds 8 DKEG V4.1 07/23/2010 Meets Exceeds 4 4 1 Meets 14 Exceeds 9 UPYM V4.1 07/03/2010 Below Meets 2 17 9 Below 16 Below 10 THCL V4.1 07/23/2010 Meets Exceeds 7 19 8 Meets 19 Below 11 SSPN V4.2 07/20/2010 Exceeds Meets 6 2 8 Exceeds 14 Exceeds 12 YRNJ V4.2 07/25/2010 Exceeds Meets 2 14 17 Below 12 Exceeds 13 MHVT V4.1 07/22/2010 Meets Meets 10 18 4 Exceeds 6 Below 14 TVKT V4.2 07/19/2010 Meets Exceeds 7 13 10 Exceeds 9 Below 15 HXRV V4.1 07/27/2010 Below Exceeds 10 6 13 Meets 15 Below
Output 9.3a Frequencies of Specific Character Variables in ENTRANCE_EXAMExample 9.3 ENTRANCE_EXAM The FREQ Procedure Cumulative Cumulative examversion Frequency Percent Frequency Percent ---------------------------------------------------------------- V4.1 6 40.00 6 40.00 V4.2 4 26.67 10 66.67 V4.3 5 33.33 15 100.00 Cumulative Cumulative essay Frequency Percent Frequency Percent ---------------------------------------------------------------- Below 4 26.67 4 26.67 Exceeds 4 26.67 8 53.33 Meets 7 46.67 15 100.00 Cumulative Cumulative speech Frequency Percent Frequency Percent ---------------------------------------------------------------- Below 3 20.00 3 20.00 Exceeds 7 46.67 10 66.67 Meets 5 33.33 15 100.00 Cumulative Cumulative logic Frequency Percent Frequency Percent ---------------------------------------------------------------- Below 4 26.67 4 26.67 Exceeds 4 26.67 8 53.33 Meets 7 46.67 15 100.00 Cumulative Cumulative athletics Frequency Percent Frequency Percent ---------------------------------------------------------------- Below 6 40.00 6 40.00 Exceeds 5 33.33 11 73.33 Meets 4 26.67 15 100.00 |
Output 9.3b PROC MEANS of Specific Numeric Variables in ENTRANCE_EXAMExample 9.3 ENTRANCE_EXAM The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ----------------------------------------------------------------------- mathematics 15 8.4000000 5.7669006 2.0000000 20.0000000 physics 15 10.1333333 6.0811966 2.0000000 19.0000000 vocabulary 15 8.5333333 4.7639519 1.0000000 17.0000000 spatial 15 13.7333333 3.6344909 6.0000000 19.0000000 ----------------------------------------------------------------------- |
The DATA step in this example examines the names and type attribute of variables in a data set in order to select character variables to analyze with PROC FREQ and to select numeric variables to analyze with PROC MEANS.
Data set ENTRANCE_EXAM contains exam results for 15 applicants to an academic program. The results are a mix of character and numeric variables. The goal is to produce a frequency table for each character exam result variable and to compute descriptive statistics for each numeric exam result variable.
The DATA step applies several data set metadata functions that access data set ENTRANCE_EXAM and evaluate its variable attributes. It does not read the observations in ENTRANCE_EXAM, and it does not create an output data set. The DATA step instead iterates only once and applies the following functions to read descriptive information about the data set and select the analysis variables:
open the data set with OPEN
determine the number of variables with ATTRN
determine the variable name with VARNAME
determine the variable type with VARTYPE
close the data set with CLOSE
An IF statement excludes from the lists variable names that contain specific text. At the end of the DATA step after all variables have been examined, CALL SYMPUTX saves the names of the selected character variables in global macro variable CHARLIST and the names of the selected numeric variables in global macro variable NUMLIST.
The TABLES statement in PROC FREQ specifies macro variable CHARLIST, which resolves so that PROC FREQ produces a one-way frequency table for each selected character variable. The VAR statement in PROC MEANS specifies macro variable NUMLIST, which resolves so that PROC MEANS produces descriptive statistics for each selected numeric variable.
Start a DATA step. Do not create an output data set. Open data set ENTRANCE_EXAM for input only. Save in DSID the unique data set identifier that OPEN assigns.Write an error message to the SAS log and stop the DATA step when the DSID value indicates that the data set could not be opened.
Define CHARVARS and NUMVARS to be long enough to hold all of the names of the character variables (CHARVARS) and the names of the numeric variables (NUMVARS). Determine the total number of variables in ENTRANCE_EXAM. Execute a DO loop the number of times equal to the number of variables in ENTRANCE_EXAM. Examine the attributes of a variable on each iteration of the loop. Obtain the name of the variable in ENTRANCE_EXAM in the position in the data set data vector that corresponds to the current value of I. Reject variables whose names contain the text string 'ID' or 'DATE'. Return to the top of the DO loop.
If the ith variable in the data set data vector is character, add its name to the list of variables that are saved in CHARVARS. Separate the names with spaces. If the ith variable in the data set data vector is numeric, add its name to the list of variables that are saved in NUMVARS. Separate the names with spaces.
Close data set ENTRANCE_EXAM. If the data set does not close successfully, as evidenced by a non-zero return code, write an error message to the SAS log. Copy the list of character variables that are stored in CHARVARS to macro variable CHARLIST. Copy the list of numeric variables that are stored in NUMVARS to macro variable NUMLIST.
Compute one-way frequency tables for the character variables whose names are saved in macro variable CHARLIST. Compute descriptive statistics for the numeric variables whose names are saved in macro variable NUMLIST.
data _null_; dsid=open('work.entrance_exam','i'), if dsid=0 then do; putlog 'ERROR: Data set ENTRANCE_EXAM could not be opened.'; stop; end; length examvarname $ 32 charvars $ 200 numvars $ 200; totalvars=attrn(dsid,'nvars'), do i=1 to totalvars; examvarname=varname(dsid,i); if index(upcase(examvarname),'ID') gt 0 or index(upcase(examvarname),'DATE') gt 0 then continue; if vartype(dsid,i)='C' then charvars= catx(' ',charvars,examvarname); else if vartype(dsid,i)='N' then numvars= catx(' ',numvars,examvarname); end; rc=close(dsid); if rc ne 0 then putlog 'ERROR: Problem in closing ENTRANCE_EXAM'; call symputx('CHARLIST',charvars); call symputx('NUMLIST',numvars); run; proc freq data=entrance_exam; title "Example 9.3 ENTRANCE_EXAM"; tables &charlist; run; proc means data=entrance_exam; title "Example 9.3 ENTRANCE_EXAM"; var &numlist; run;
The following PROC SQL step selects the same character and numeric analysis columns as the DATA step in the main example. It also stores the lists of columns in the same macro variables, CHARLIST and NUMLIST.
The PROC SQL step extracts column names and column type from the COLUMNS dictionary table. A dictionary table is a read-only PROC SQL table that is supplied by SAS. The COLUMNS dictionary table contains data about the columns in tables. The data that are available include library name, member name, column name, column type, and column label.
Accessible directly through PROC SQL programming or through views in the SASHELP library, these tables can provide you with information about tables, options, external files, and many other objects in your current SAS session. SAS automatically defines the DICTIONARY libref for you when your SAS session starts.
The following PROC SQL step has two SELECT statements: one to select the character columns and save their names in macro variable CHARLIST, and a second to select the numeric columns and save their names in macro variable NUMLIST. A WHERE clause added to each SELECT statement specifies the table from which to select the column data, the type of variable to select, and text that the variable name must not contain.
Example 9.5 also accesses dictionary tables and includes details about the tables in the ``A Closer Look'' section.
Suppress printed output. Select NAME, which is the column (or variable) name in the table. Store the column name values in macro variable NUMLIST and separate them with a space. Select rows from the COLUMNS dictionary table. Specify that the rows selected from DICTIONARY.COLUMNS must be for the WORK.ENTRANCE_EXAM table, that the column type is numeric, and that the column name must not contain the text "DATE" or "ID". Columns LIBNAME, MEMNAME, and TYPE are columns in DICTIONARY.COLUMNS. Specify that the rows selected from DICTIONARY.COLUMNS must be for the WORK.ENTRANCE_EXAM table, that the column type is character, and that the column name must not contain the text "DATE" or "ID".
Compute one-way frequency tables for the character variables whose names are saved in macro variable CHARLIST. Compute descriptive statistics for the numeric variables whose names are saved in macro variable NUMLIST.
proc sql noprint; select name into :numlist separated by ' ' from dictionary.columns where libname='WORK' and memname='ENTRANCE_EXAM' and type='num' and index(upcase(name),'DATE')=0 and index(upcase(name),'ID')=0; select name into :charlist separated by ' ' from dictionary.columns where libname='WORK' and memname='ENTRANCE_EXAM' and type='char' and index(upcase(name),'DATE')=0 and index(upcase(name),'ID')=0; quit; proc freq data=entrance_exam; title "Example 9.3 ENTRANCE_EXAM"; tables &charlist; run; proc means data=entrance_exam; title "Example 9.3 ENTRANCE_EXAM"; var &numlist; run;
The DATA step in the main example uses several SAS file input/output functions. As demonstrated, you can programmatically work with metadata about your SAS data sets in a DATA step without reading the SAS data set.
Typically, you start by opening the SAS file with the OPEN function in the DATA step. This function assigns an identifier to your data set that you can reference later in calls to other functions that extract information about the data set. When the DATA step ends, SAS automatically closes the data set. You can also close it yourself with the CLOSE function before the DATA step ends as shown in the example.
The two functions ATTRN and ATTRC return information about the attributes of a SAS data set. The ATTRN returns attributes that are numeric in value while ATTRC returns attributes that are character in value.
Other useful functions include EXIST, which tests whether a SAS library member exists; PATHNAME, which returns the physical name of an external file or SAS library; and a series of variables that obtain metadata about the variables in a data set. These functions include VARFMT, VARINFMT, VARLABEL, VARLEN, VARNAME, VARNUM, and VARTYPE.
Similar to the SAS file input/output functions is the set of functions that work with external files. These functions can return information about external files, and they can add to and delete data from external files.
For more information about these functions, see SAS documentation.
It is possible to skip the DATA step or PROC SQL step that evaluates variable attributes and names and instead specify the lists of variables in PROC FREQ and PROC MEANS by using a combination of techniques to specify groups of variables:
DROP or KEEP data set options. The following PROC FREQ step does not compute one-way frequency tables for numeric variables or for any variable whose name starts with the text ``ID''.
proc freq data=entrance_exam(drop=_numeric_ id:); run;
_CHARACTER_ and _NUMERIC_ keywords. The following PROC FREQ step computes one-way frequency tables for all character variables.
proc freq data=entrance_exam; tables _character_; run;
Variable range lists that use the _CHARACTER_ and _NUMERIC_ keywords. The following PROC MEANS step computes descriptive statistics on the numeric variables between ESSAY and ATHLETICS in the data set data vector. Even though it is implicit that PROC MEANS can compute descriptive statistics only on numeric variables, SAS generates an error if a character variable is in the list of variables. Character variables ESSAY, SPEECH, LOGIC, and ATHLETICS are within the list and cause the PROC MEANS step to stop when the _NUMERIC_ keyword is omitted.
proc means data=entrance_exam; var essay-numeric-athletics; run;
However, depending on the complexities of including and excluding specific variables using lists, it might be easier to specify the lists by using a process that is similar to the one demonstrated in the DATA step or PROC SQL step.
See also Example 8.15 for its usage of the _CHARACTER_ and _NUMERIC_ keywords in ARRAY statements in a DATA step.
18.188.154.252