2.2. Data Types Overview

The purpose of a database is to store data. A database contains one or more tables (and other components). Tables consist of columns and rows of data. In the SAS implementation of SQL, there are two available data types: 1) numeric and 2) character.

2.2.1. Numeric Data

The SAS implementation of SQL provides programmers with numerous arithmetic, statistical, and summary functions. It offers one numeric data type to represent numeric data. Columns defined as a numeric data type with the NUMERIC or NUM column definition are assigned a default length of 8 bytes, even if the column is created with a numeric length less than 8 bytes. This provides the greatest degree of precision allowed by the SAS System. In the example below, a table called PURCHASES is created consisting of two character and two numeric columns. The resulting table contains no rows of data, as illustrated by the SAS log results. For more information about the CREATE TABLE statement, see Chapter 5, “Creating, Populating, and Deleting Tables.”

SQL Code

PROC SQL;
  CREATE TABLE PURCHASES
    (CUSTNUM  CHAR(4),
     ITEM     CHAR(10),
     UNITS    NUM
     UNITCOST NUM(8,2));
QUIT;

SAS Log Results

       PROC SQL;
         CREATE TABLE PURCHASES
           (CUSTNUM  CHAR(4),
            ITEM     CHAR(10),
            UNITS    NUM,
            UNITCOST NUM(8,2));
  NOTE: Table PURCHASES created, with 0 rows and 4
  columns.
       QUIT;


Results

                 The CONTENTS Procedure

  -----Alphabetic List of Variables and
 Attributes-----

           #    Variable    Type    Len    Pos
           ___________________________________
           1    CUSTNUM     Char      4     16
           2    ITEM        Char     10     20
           3    UNITS       Num       8      0
           4    UNITCOST    Num       8      8


Use the DATA step LENGTH statement to create a column of fewer than 8 bytes. Although this action can cause precision issues, it allows for more efficient use of data storage resources. See the SAS Language Reference: Dictionary for more information. The example illustrates a DATA step that assigns smaller lengths to the two numeric variables, UNITS and UNITCOST, in the PURCHASES table. The CONTENTS output illustrates the creation of shorter length numeric variables.

DATA Step Code

DATA PURCHASES;
  LENGTH CUSTNUM  $4.
         ITEM    $10.
         UNITS     3.
         UNITCOST  4.;
  LABEL CUSTNUM  = 'Customer Number'
        ITEM     = 'Item Purchased'
        UNITS    = '# Units Purchased'
        UNITCOST = 'Unit Cost';
  FORMAT UNITCOST DOLLAR12.2;
RUN;
PROC CONTENTS DATA=PURCHASES;
RUN;

SAS Log Results

   DATA PURCHASES;
     LENGTH CUSTNUM  $4.
            ITEM    $10.
            UNITS     3.
            UNITCOST  4.;
     LABEL CUSTNUM  = 'Customer Number'
            ITEM     = 'Item Purchased'
            UNITS    = '# Units Purchased'
            UNITCOST = 'Unit Cost';
     FORMAT UNITCOST DOLLAR12.2;
   RUN;

   NOTE: Variable CUSTNUM is uninitialized.
   NOTE: Variable ITEM is uninitialized.
   NOTE: Variable UNITS is uninitialized.
   NOTE: Variable UNITCOST is uninitialized.
   NOTE: The data set WORK.PURCHASES has 1
 observations
         and 4 variables.
  NOTE: DATA statement used:
         real time           2.80 seconds

   PROC CONTENTS DATA=PURCHASES;
   RUN;

   NOTE: PROCEDURE CONTENTS used:
         real time           1.82 seconds


CONTENTS Results

                   The CONTENTS Procedure

     -----Alphabetic List of Variables and
 Attributes-----

   #   Variable   Type    Len   Pos    Format     
 Label
   _______________________________________________
_________
   1   CUSTNUM    Char      4     4               
 Customer Number
   2   ITEM       Char     10     8               
 Item Purchased
   3   UNITS      Num       3    18               
 # Units Purchased
   4   UNITCOST   Num       4     0    DOLLAR12.2 
 Unit Cost


2.2.2. Date and Time Column Definitions

Database application processing stores date and time information in the form of a numeric data type. Date and time values are represented internally as an offset where a SAS date value is stored as the number of days from the fixed date value of 01/01/1960 (January 1, 1960). The SAS date value for January 1, 1960, is represented as 0 (zero). A date earlier than this is represented as a negative number, and a date later than this is represented as a positive number. This makes performing date calculations much easier.

The SAS System has integrated the vast array of date and time informats and formats with PROC SQL. The various informats and formats act as input and output templates and describe how date and time information is to be read or rendered on output. See the SAS Language Reference: Dictionary for detailed descriptions of the various informats and formats and their use. Numeric date and time columns, when combined with informats and/or formats, automatically validate values according to the following rules:

DateDate informats and formats enable PROC SQL and the SAS System to determine the month, day, and year values of a date. The month value handles values between 1 through 12. The day value handles values from 1 through 31 and applies additional validations to a maximum of 28, 29, or 30 depending on the month in question. The year value handles values 1 through 9999. Dates go back to 1582 and ahead to 20,000. When you enter a year value of 0001 and specify a format and yearcutoff value of 1920, the returned value would be 2001.
TimeTime informats and formats enable PROC SQL to determine the hour, minute, and second values of a time. The hour portion handles values between 00 and 23. The minute portion handles values from 00 through 59. The second portion handles values from 00 through 59.
DATETIMEDate and time stamps enable PROC SQL to determine the month, day, and year of a date as well the hour, minute, and second of a time.

See Chapter 5, “Creating, Populating, and Deleting Tables” and Chapter 6, “Modifying and Updating Tables and Indexes,” for more information about date and time informats and formats.

2.2.3. Character Data

PROC SQL provides tools to manipulate and store character data including words, text, and codes using the CHARACTER or CHAR data type. The characters allowed by this data type include the ASCII or EBCDIC character sets. The CHARACTER or CHAR data type stores fixed-length character strings consisting of a maximum of 32K characters. If a length is not specified, a CHAR column stores a default of 8 characters.

The SQL programmer has a vast array of SQL and Base SAS functions that can make the task of working with character data considerably easier. In this chapter you’ll learn how columns based on the character data type are defined and how string functions, pattern matching, phonetic matching techniques, and a variety of other techniques are used with character data.

2.2.4. Missing Values and Null

Missing values are an important aspect of dealing with data. The concept of missing values is familiar to programmers, statisticians, researchers, and other SAS users. This section describes what null values are, what they aren’t, and how they are used.

Missing or unknown information is supported by PROC SQL in a form known as a null value. A null value is not the same as a zero value. In the SAS System, nulls are treated as a separate category from known values. A value consisting of zero has a known value. In contrast, a value of null has an unknown quantity and will never be known. For example, a patient who is given an eye exam does not have zero eyesight just because the results from the exam haven’t been received. The correct value to assign in a case like this is a missing or a null value.

In another example, say a person declines to provide their age on a survey. This person’s age is null, not zero. Essentially, this person has an age, but it is unknown. Whenever an unknown value occurs, you have no choice but to assign an unknown value – null.

Since the value of null is unknown, any arithmetic calculation using a null will return a null. This makes a lot of sense since the results of a calculation using a null are not determinable. This is sometimes referred to as the propagation of nulls because when a null value is used in a calculation or an expression, it propagates a null value. For example, if a null is added to a known value, the result is a null value.

2.2.5. Arithmetic and Missing Data

In the SAS System, a numeric data type containing a null value (absence of any value) is represented with a period (.) This representation indicates that the column has not been assigned a value. A null value has no value and is not the same as zero. A value consisting of zero has a known quantity as opposed to a null value that is not known and never will be known.

If a null value is multiplied with a known value, the result is a null value represented with a period (.). In the next example, when UNITS and UNITCOST both have known values, their product will generate a known value, as is illustrated for the Markers purchase below.

SQL Code

PROC SQL;
  SELECT CUSTNUM,
         ITEM,
         UNITS,
         UNITCOST,
         UNITS * UNITCOST AS TOTAL
    FROM PURCHASES
      ORDER BY TOTAL;
QUIT;

Results

                    The SAS System

  custnum  item           units      unitcost    
 TOTAL

       13  Markers            .         $0.99     
    .
        1  Stapler            1         $8.95  8
.949997
        1  Pens              12         $0.89    
 10.68
        7  Mouse Pad          1        $11.79  11
.78999
        1  Paper              4         $6.95  27
.79999
        7  Pens              24         $1.59  38
.15998
        1  Chair              1       $179.00     
  179


2.2.6. SQL Keywords

SQL provides three keywords: AS, DISTINCT, and UNIQUE to perform specific operations on the results. Each will be presented in order as follows.

2.2.6.1. Creating Column Aliases

In situations where data is computed using system functions, statistical functions, or arithmetic operations, a column name or header can be left blank. To prevent this from occurring, users may specify the AS keyword to provide a name to the column or heading itself. The next example illustrates using the AS keyword to prevent the name for the computed column from being assigned a temporary column name similar to: _TEMAxxx. The name assigned with the AS keyword is also used as the column header on output, as shown.

SQL Code

PROC SQL;
  SELECT PRODNAME,
         PRODTYPE,
         PRODCOST * 0.80  AS Discount_Price
    FROM PRODUCTS
      ORDER BY 3;
QUIT;

Results

                        The SAS System

     Product Name               Product Type    
 Discount_Price
     _____________________________________________
______________
     Analog Cell Phone          Phone             
            28
     Office Phone               Phone             
           104
     Digital Cell Phone         Phone             
           140
     Spreadsheet Software       Software          
         239.2
     Graphics Software          Software          
         239.2
     Wordprocessor Software     Software          
         239.2
     Database Software          Software          
         319.2
     Dream Machine              Workstation       
          2560
     Business Machine           Workstation       
          2640
     Travel Laptop              Laptop            
          2720


2.2.6.2. Finding Duplicate Values

In some situations, several rows in a table will contain identical column values. To select only one of each duplicate value, the DISTINCT keyword can be used in the SELECT statement as follows.

SQL Code

PROC SQL;
  SELECT DISTINCT MANUNUM
    FROM INVENTORY;
QUIT;

Results

                    The SAS System

                     Manufacturer
                           Number
                     ____________
                              111
                              170
                              500
                              600


2.2.6.3. Finding Unique Values

In some situations, several rows in a table will contain identical column values. To select each of these duplicate values only once, the UNIQUE keyword can be used in the SELECT statement.

SQL Code

PROC SQL;
  SELECT UNIQUE MANUNUM
    FROM INVENTORY;
QUIT;

Results

                    The SAS System

                     Manufacturer
                           Number
                     ____________
                              111
                              170
                              500
                              600


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.75.70