Example 9.3 Determining Whether a Variable Is Character or Numeric

Goal

Determine whether a variable is character or numeric to ensure that you have the right type of data for your application.

Example Features

Featured StepDATA step
Featured Step Options and StatementsATTRN, CLOSE, INDEX, OPEN, UPCASE, VARNAME, and VARTYPE functions CALL SYMPUTX routine Open code macro variables
Related TechniquePROC SQL, DICTIONARY.COLUMNS table, INTO clause to create a macro variable, INDEX and UPCASE functions
A Closer LookUsing the SAS File I/O Functions Specifying Variable Lists

Input Data Set

Data set ENTRANCE_EXAM stores exam results for 15 applicants to an academic program. The results are saved in a mix of character and numeric variables.

                                                 ENTRANCE_EXAM

   Obs  id  examversion   examdate  essay  speech  mathematics physics vocabulary  logic  spatial athletics
     1 MXMI    V4.1     07/02/2010 Meets   Below        19        17       14     Exceeds    14    Below
     2 URQU    V4.3     07/31/2010 Meets   Meets         2        14        2     Meets      14    Meets
     3 JOPE    V4.2     07/13/2010 Exceeds Exceeds       5         8        4     Below      17    Exceeds
     4 TYJP    V4.3     07/21/2010 Meets   Exceeds      14         6       10     Meets      18    Meets
     5 YPDF    V4.3     07/30/2010 Exceeds Exceeds       6         2        7     Meets      17    Meets
     6 PRGH    V4.3     07/01/2010 Below   Below        12         7       15     Below       9    Meets
     7 AYPX    V4.3     07/06/2010 Below   Below        20         5        6     Meets      12    Exceeds
     8 DKEG    V4.1     07/23/2010 Meets   Exceeds       4         4        1     Meets      14    Exceeds
     9 UPYM    V4.1     07/03/2010 Below   Meets         2        17        9     Below      16    Below
    10 THCL    V4.1     07/23/2010 Meets   Exceeds       7        19        8     Meets      19    Below
    11 SSPN    V4.2     07/20/2010 Exceeds Meets         6         2        8     Exceeds    14    Exceeds
    12 YRNJ    V4.2     07/25/2010 Exceeds Meets         2        14       17     Below      12    Exceeds
    13 MHVT    V4.1     07/22/2010 Meets   Meets        10        18        4     Exceeds     6    Below
    14 TVKT    V4.2     07/19/2010 Meets   Exceeds       7        13       10     Exceeds     9    Below
    15 HXRV    V4.1     07/27/2010 Below   Exceeds      10         6       13     Meets      15    Below

Resulting Output

Output 9.3a Frequencies of Specific Character Variables in ENTRANCE_EXAM

                        Example 9.3 ENTRANCE_EXAM

                            The FREQ Procedure

                                             Cumulative    Cumulative
     examversion    Frequency     Percent     Frequency       Percent
     ----------------------------------------------------------------
     V4.1                  6       40.00             6         40.00
     V4.2                  4       26.67            10         66.67
     V4.3                  5       33.33            15        100.00

                                           Cumulative     Cumulative
       essay      Frequency     Percent     Frequency       Percent
       ----------------------------------------------------------------
       Below             4       26.67             4         26.67
       Exceeds           4       26.67             8         53.33
       Meets             7       46.67            15        100.00

                                           Cumulative     Cumulative
       speech     Frequency     Percent     Frequency       Percent
       ----------------------------------------------------------------
       Below             3       20.00             3         20.00
       Exceeds           7       46.67            10         66.67
       Meets             5       33.33            15        100.00

                                           Cumulative     Cumulative
       logic      Frequency     Percent     Frequency       Percent
       ----------------------------------------------------------------
       Below             4       26.67             4         26.67
       Exceeds           4       26.67             8         53.33
       Meets             7       46.67            15        100.00

                                            Cumulative     Cumulative
      athletics    Frequency     Percent     Frequency       Percent
      ----------------------------------------------------------------
      Below               6       40.00             6         40.00
      Exceeds             5       33.33            11         73.33
      Meets               4       26.67            15        100.00


Output 9.3b PROC MEANS of Specific Numeric Variables in ENTRANCE_EXAM

                        Example 9.3 ENTRANCE_EXAM

                            The MEANS Procedure

  Variable      N          Mean       Std Dev       Minimum        Maximum
  -----------------------------------------------------------------------
  mathematics  15     8.4000000     5.7669006     2.0000000     20.0000000
  physics      15    10.1333333     6.0811966     2.0000000     19.0000000
  vocabulary   15     8.5333333     4.7639519     1.0000000     17.0000000
  spatial      15    13.7333333     3.6344909     6.0000000     19.0000000
  -----------------------------------------------------------------------


Example Overview

The DATA step in this example examines the names and type attribute of variables in a data set in order to select character variables to analyze with PROC FREQ and to select numeric variables to analyze with PROC MEANS.

Data set ENTRANCE_EXAM contains exam results for 15 applicants to an academic program. The results are a mix of character and numeric variables. The goal is to produce a frequency table for each character exam result variable and to compute descriptive statistics for each numeric exam result variable.

The DATA step applies several data set metadata functions that access data set ENTRANCE_EXAM and evaluate its variable attributes. It does not read the observations in ENTRANCE_EXAM, and it does not create an output data set. The DATA step instead iterates only once and applies the following functions to read descriptive information about the data set and select the analysis variables:

  • open the data set with OPEN

  • determine the number of variables with ATTRN

  • determine the variable name with VARNAME

  • determine the variable type with VARTYPE

  • close the data set with CLOSE

An IF statement excludes from the lists variable names that contain specific text. At the end of the DATA step after all variables have been examined, CALL SYMPUTX saves the names of the selected character variables in global macro variable CHARLIST and the names of the selected numeric variables in global macro variable NUMLIST.

The TABLES statement in PROC FREQ specifies macro variable CHARLIST, which resolves so that PROC FREQ produces a one-way frequency table for each selected character variable. The VAR statement in PROC MEANS specifies macro variable NUMLIST, which resolves so that PROC MEANS produces descriptive statistics for each selected numeric variable.

Program

Start a DATA step. Do not create an output data set. Open data set ENTRANCE_EXAM for input only. Save in DSID the unique data set identifier that OPEN assigns.Write an error message to the SAS log and stop the DATA step when the DSID value indicates that the data set could not be opened.

Define CHARVARS and NUMVARS to be long enough to hold all of the names of the character variables (CHARVARS) and the names of the numeric variables (NUMVARS). Determine the total number of variables in ENTRANCE_EXAM. Execute a DO loop the number of times equal to the number of variables in ENTRANCE_EXAM. Examine the attributes of a variable on each iteration of the loop. Obtain the name of the variable in ENTRANCE_EXAM in the position in the data set data vector that corresponds to the current value of I. Reject variables whose names contain the text string 'ID' or 'DATE'. Return to the top of the DO loop.

If the ith variable in the data set data vector is character, add its name to the list of variables that are saved in CHARVARS. Separate the names with spaces. If the ith variable in the data set data vector is numeric, add its name to the list of variables that are saved in NUMVARS. Separate the names with spaces.

Close data set ENTRANCE_EXAM. If the data set does not close successfully, as evidenced by a non-zero return code, write an error message to the SAS log. Copy the list of character variables that are stored in CHARVARS to macro variable CHARLIST. Copy the list of numeric variables that are stored in NUMVARS to macro variable NUMLIST.

Compute one-way frequency tables for the character variables whose names are saved in macro variable CHARLIST. Compute descriptive statistics for the numeric variables whose names are saved in macro variable NUMLIST.

data _null_;

  dsid=open('work.entrance_exam','i'),


  if dsid=0 then do;
    putlog
     'ERROR: Data set ENTRANCE_EXAM could not be opened.';
    stop;
  end;
  length examvarname $ 32 charvars $ 200 numvars $ 200;



  totalvars=attrn(dsid,'nvars'),

  do i=1 to totalvars;



    examvarname=varname(dsid,i);

    if index(upcase(examvarname),'ID') gt 0 or
       index(upcase(examvarname),'DATE') gt 0 then
               continue;

    if vartype(dsid,i)='C' then charvars=
                catx(' ',charvars,examvarname);



    else if vartype(dsid,i)='N' then numvars=
                    catx(' ',numvars,examvarname);



  end;
  rc=close(dsid);
  if rc ne 0 then
         putlog 'ERROR: Problem in closing ENTRANCE_EXAM';

  call symputx('CHARLIST',charvars);


  call symputx('NUMLIST',numvars);


run;
proc freq data=entrance_exam;
  title "Example 9.3 ENTRANCE_EXAM";
  tables &charlist;
run;
proc means data=entrance_exam;
  title "Example 9.3 ENTRANCE_EXAM";
  var &numlist;
run;

Related Technique

The following PROC SQL step selects the same character and numeric analysis columns as the DATA step in the main example. It also stores the lists of columns in the same macro variables, CHARLIST and NUMLIST.

The PROC SQL step extracts column names and column type from the COLUMNS dictionary table. A dictionary table is a read-only PROC SQL table that is supplied by SAS. The COLUMNS dictionary table contains data about the columns in tables. The data that are available include library name, member name, column name, column type, and column label.

Accessible directly through PROC SQL programming or through views in the SASHELP library, these tables can provide you with information about tables, options, external files, and many other objects in your current SAS session. SAS automatically defines the DICTIONARY libref for you when your SAS session starts.

The following PROC SQL step has two SELECT statements: one to select the character columns and save their names in macro variable CHARLIST, and a second to select the numeric columns and save their names in macro variable NUMLIST. A WHERE clause added to each SELECT statement specifies the table from which to select the column data, the type of variable to select, and text that the variable name must not contain.

Example 9.5 also accesses dictionary tables and includes details about the tables in the ``A Closer Look'' section.

Suppress printed output. Select NAME, which is the column (or variable) name in the table. Store the column name values in macro variable NUMLIST and separate them with a space. Select rows from the COLUMNS dictionary table. Specify that the rows selected from DICTIONARY.COLUMNS must be for the WORK.ENTRANCE_EXAM table, that the column type is numeric, and that the column name must not contain the text "DATE" or "ID". Columns LIBNAME, MEMNAME, and TYPE are columns in DICTIONARY.COLUMNS. Specify that the rows selected from DICTIONARY.COLUMNS must be for the WORK.ENTRANCE_EXAM table, that the column type is character, and that the column name must not contain the text "DATE" or "ID".

Compute one-way frequency tables for the character variables whose names are saved in macro variable CHARLIST. Compute descriptive statistics for the numeric variables whose names are saved in macro variable NUMLIST.

proc sql noprint;
  select name into :numlist separated by ' '

          from dictionary.columns

         where libname='WORK' and memname='ENTRANCE_EXAM'
               and type='num' and index(upcase(name),'DATE')=0
                and index(upcase(name),'ID')=0;





         select name into :charlist separated by ' '
          from dictionary.columns
         where libname='WORK' and memname='ENTRANCE_EXAM' and
               type='char' and index(upcase(name),'DATE')=0
                and index(upcase(name),'ID')=0;


     quit;
     proc freq data=entrance_exam;
       title "Example 9.3 ENTRANCE_EXAM";
       tables &charlist;
     run;
     proc means data=entrance_exam;
       title "Example 9.3 ENTRANCE_EXAM";
       var &numlist;
     run;

A Closer Look

Using the SAS File I/O Functions

The DATA step in the main example uses several SAS file input/output functions. As demonstrated, you can programmatically work with metadata about your SAS data sets in a DATA step without reading the SAS data set.

Typically, you start by opening the SAS file with the OPEN function in the DATA step. This function assigns an identifier to your data set that you can reference later in calls to other functions that extract information about the data set. When the DATA step ends, SAS automatically closes the data set. You can also close it yourself with the CLOSE function before the DATA step ends as shown in the example.

The two functions ATTRN and ATTRC return information about the attributes of a SAS data set. The ATTRN returns attributes that are numeric in value while ATTRC returns attributes that are character in value.

Other useful functions include EXIST, which tests whether a SAS library member exists; PATHNAME, which returns the physical name of an external file or SAS library; and a series of variables that obtain metadata about the variables in a data set. These functions include VARFMT, VARINFMT, VARLABEL, VARLEN, VARNAME, VARNUM, and VARTYPE.

Similar to the SAS file input/output functions is the set of functions that work with external files. These functions can return information about external files, and they can add to and delete data from external files.

For more information about these functions, see SAS documentation.

Specifying Variable Lists

It is possible to skip the DATA step or PROC SQL step that evaluates variable attributes and names and instead specify the lists of variables in PROC FREQ and PROC MEANS by using a combination of techniques to specify groups of variables:

  • DROP or KEEP data set options. The following PROC FREQ step does not compute one-way frequency tables for numeric variables or for any variable whose name starts with the text ``ID''.

      proc freq data=entrance_exam(drop=_numeric_ id:);
      run;

  • _CHARACTER_ and _NUMERIC_ keywords. The following PROC FREQ step computes one-way frequency tables for all character variables.

      proc freq data=entrance_exam;
        tables _character_;
      run;

  • Variable range lists that use the _CHARACTER_ and _NUMERIC_ keywords. The following PROC MEANS step computes descriptive statistics on the numeric variables between ESSAY and ATHLETICS in the data set data vector. Even though it is implicit that PROC MEANS can compute descriptive statistics only on numeric variables, SAS generates an error if a character variable is in the list of variables. Character variables ESSAY, SPEECH, LOGIC, and ATHLETICS are within the list and cause the PROC MEANS step to stop when the _NUMERIC_ keyword is omitted.

       proc means data=entrance_exam;
        var essay-numeric-athletics;
       run;

However, depending on the complexities of including and excluding specific variables using lists, it might be easier to specify the lists by using a process that is similar to the one demonstrated in the DATA step or PROC SQL step.

See also Example 8.15 for its usage of the _CHARACTER_ and _NUMERIC_ keywords in ARRAY statements in a DATA step.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.154.252