Chapter 2: Working with Data in PROC SQL

Introduction. 21

The SELECT Statement and Clauses. 21

Overview of Data Types. 23

Numeric Data. 24

Date and Time Column Definitions. 26

Character Data. 27

Missing Values and NULL. 27

Arithmetic and Missing Data. 28

SQL Keywords. 30

SQL Operators, Functions, and Keywords. 31

Comparison Operators. 32

Logical Operators. 33

Arithmetic Operators. 34

Character String Operators and Functions. 36

Summarizing Data. 51

Predicates. 55

CALCULATED Keyword. 63

Dictionary Tables. 66

Dictionary Tables and Metadata. 67

Displaying Dictionary Table Definitions. 68

Dictionary Table Column Names. 69

Accessing a Dictionary Table’s Contents. 72

Summary. 82

 

Introduction

PROC SQL is essentially a database language as opposed to a procedural or computational language. This chapter’s focus is on working with data in PROC SQL using the SELECT statement. Often referred to as an SQL query, the SELECT statement is the most versatile statement in SQL and is used to read data from one or more database tables (or data sets). It also supports numerous extensions including keywords, operators, functions, and predicates, and returns the data in a table-like structure called a result-set.

The SELECT Statement and Clauses

The SELECT statement’s purpose is to retrieve (or read) data from the underlying tables (or views). Although it supports multiple clauses, the SELECT statement has only one clause that is required to be specified – the FROM clause. All the remaining clauses, described below, are optional and only used when needed. Note: Not every query needs to have all the clauses specified, but SQL provides developers and data analysts with a powerful and flexible language to access, manipulate, and display data without the need to write large amounts of code.

During execution, SAS carries out the tasks associated with planning, optimizing, and performing the operations specified in the SELECT statement and its clauses to produce the desired results. To prevent syntax errors from occurring when using the SELECT statement, the clauses must be specified in the correct order. To help you remember the order of the SELECT statement’s clauses recite, “SQL is fun when geeks help others.” The first letter in each word corresponds to the name of the SELECT statement’s clause as shown in Figure 2.1.

Figure 2.1: Order of the SELECT Statement Clauses

Figure 2.1: Order of the SELECT Statement Clauses

When constructed correctly, the SELECT statement and its clauses declares the database table (or data set) to find the data in, what data to retrieve, and whether any special transformations or processing is needed before the data is returned. The next example shows the correct syntax of a query’s SELECT statement and its clauses.

SQL Code

PROC SQL;

  SELECT PRODNAME

        ,PRODTYPE

        ,PRODCOST

    INTO :M_PRODNAME

        ,:M_PRODTYPE

        ,:M_PRODCOST

      FROM PRODUCTS

        WHERE PRODNAME CONTAINS "Software"

          GROUP BY PRODTYPE

            HAVING COUNT(PRODTYPE) > 3

              ORDER BY PRODNAME;

QUIT;

Results

image shown here

Now that we’ve explored the order that each clause is specified in an SQL query, let’s examine the order of execution of each clause in an SQL query. Table 2.1 illustrates and describes the execution order of each SELECT statement clause.

Table 2.1: Clause Execution Order

Clause Execution Order

Description

1.  FROM Clause

The first clause executed in a query is the FROM clause. It is a required clause with the purpose of determining the working set of data that is being queried (i.e., variable names, variable type, number of rows, and other important information).

2.  INTO Clause

The INTO clause is used to create one or more macro variables where the values can be used to manipulate data in DATA and PROC steps.

3.  WHERE Clause

The WHERE clause is used to subset rows of data based on the condition(s) specified, and rows that aren’t satisfied by the condition(s) are discarded.

4.  GROUP BY Clause

The GROUP BY clause takes the rows that were subset with the WHERE clause and grouped based on common values in the column specified in the GROUP BY clause.

5.  HAVING Clause

The HAVING clause applies the condition(s) to the grouped rows specified in the GROUP BY clause, and any grouped rows that aren’t satisfied by the condition(s) are discarded.

6.  SELECT Statement

Expressions specified in the SELECT statement are processed.

7.  ORDER BY Clause

The ORDER BY clause sorts the rows of data in either ascending (default) or descending order.

 

Overview of Data Types

The purpose of a database is to store data. A database contains one or more tables (and other components). Tables consist of columns and rows of data. In the SAS implementation of SQL, the available data types are limited to only two possibilities:

         numeric

         character

Numeric Data

The SAS implementation of SQL provides programmers with numerous arithmetic, statistical, and summary functions. It offers one numeric data type to represent numeric data. Columns defined as a numeric data type with the NUMERIC or NUM column definition are assigned a default length of 8 bytes, even if the column is created with a numeric length of less than 8 bytes. This provides the greatest degree of precision allowed by SAS software. In the example, a table called PURCHASES is created consisting of four numeric columns. The resulting table contains no rows of data, as illustrated by the SAS log results. For more information about the CREATE TABLE statement, see Chapter 5, “Creating, Populating, and Deleting Tables.”

SQL Code

PROC SQL;

  CREATE TABLE PURCHASES

    (CUSTNUM  NUM,

     PRODNUM  NUM,

     UNITS    NUM,

     UNITCOST NUM(8,2));

QUIT;

SAS Log Results

     PROC SQL;

       CREATE TABLE PURCHASES

         (CUSTNUM  NUM,

          PRODNUM  NUM,

          UNITS    NUM,

          UNITCOST NUM(8,2));

NOTE: Table PURCHASES created, with 0 rows and 4 columns.

     QUIT;

Results

image shown here

Creating a numeric column that is less than 8 bytes requires the use of the DATA step LENGTH statement. Although it is not necessary to assign smaller lengths to numeric columns because it can result in precision issues, doing so can make for more efficient use of data storage system resources. The example illustrates a DATA step that assigns smaller lengths to the four numeric variables in the PURCHASES table: CUSTNUM, PRODNUM, UNITS, and UNITCOST. In contrast to the SAS log results produced by the previous PROC SQL code, the CONTENTS output illustrates the creation of a data set with one record and four user-defined, and shorter length, numeric variables.

DATA Step Code

DATA PURCHASES;

  LENGTH CUSTNUM   4.

         PRODNUM   3.

         UNITS     3.

         UNITCOST  4.;

  LABEL CUSTNUM  = ‘Customer Number’

        PRODNUM  = ‘Product Purchased’

        UNITS    = ‘# Units Purchased’

        UNITCOST = ‘Unit Cost’;

  FORMAT UNITCOST DOLLAR12.2;

RUN;

PROC CONTENTS DATA=PURCHASES;

RUN;

SAS Log Results

DATA PURCHASES;

  LENGTH CUSTNUM   4.

         PRODNUM   3.

         UNITS     3.

         UNITCOST  4.;

   LABEL CUSTNUM  = 'Customer Number'

         PRODNUM  = 'Product Purchased'

         UNITS    = '# Units Purchased'

         UNITCOST = 'Unit Cost';

  FORMAT UNITCOST DOLLAR12.2;

RUN;

 

NOTE: Variable CUSTNUM is uninitialized.

NOTE: Variable PRODNUM is uninitialized.

NOTE: Variable UNITS is uninitialized.

NOTE: Variable UNITCOST is uninitialized.

NOTE: The data set WORK.PURCHASES has 1 observations and 4 variables.

NOTE: DATA statement used:

      real time           2.80 seconds

 

PROC CONTENTS DATA=PURCHASES;

RUN;

 

NOTE: PROCEDURE CONTENTS used:

      real time           1.82 seconds

CONTENTS Results

image shown here

In the next example, a LENGTH= modifier is illustrated in the PROC SQL code to assign smaller lengths to the four numeric variables (columns) in the PURCHASES table: CUSTNUM, PRODNUM, UNITS, and UNITCOST.

PROC SQL Code

PROC SQL;

  CREATE TABLE PURCHASES

   (CUSTNUM  NUM  LENGTH=4

      LABEL=’Customer Number’,

    PRODNUM  NUM  LENGTH=3

      LABEL=’Product Purchased’,

    UNITS    NUM  LENGTH=3

      LABEL=’# Units Purchased’,

    UNITCOST NUM  LENGTH=4

      LABEL=’Unit Cost’);

QUIT;

As shown in the CONTENTS results below, four numeric columns are defined using a LENGTH= column modifier that is smaller than the default length of 8 bytes.

CONTENTS Results

image shown here

Date and Time Column Definitions

Database application processing stores date and time information in the form of a numeric data type. Date and time values are represented internally as an offset where a SAS date value is stored as the number of days from the fixed date value of 01/01/1960 (January 1, 1960). The SAS date value for January 1, 1960, is represented as 0 (zero). A date earlier than this date is represented as a negative number and a date later than this date is represented as a positive number. This makes performing date calculations much easier.

SAS has integrated the vast array of date and time informats and formats with PROC SQL. The various informats and formats act as input and output templates and describe how date and time information is to be read or rendered on output. See the SAS Language Reference: Dictionary for detailed descriptions of the various informats and formats and their use. Numeric date and time columns, when combined with informats and/or formats, automatically validate values according to the following rules:

         Date—Date informats and formats enable PROC SQL and SAS to determine the month, day, and year values of a date. The month value handles values from 1 through 12. The day value handles values from 1 through 31 and applies additional validations to a maximum of 28, 29, or 30 depending on the month in question. The year value handles values from 1 through 9999. Dates go back to 1582 and ahead to 20,000. When you enter a year value of 0001 and specify a format and a Yearcutoff value of 1920, the returned value would be 2001.

         Time —Time informats and formats enable PROC SQL to determine the hour, minute, and second values of a time. The hour portion handles values from 00 through 23. The minute portion handles values from 00 through 59. The second portion handles values from 00 through 59.

         DATETIME—Date and time stamps enable the SQL procedure to determine the month, day, and year of a date as well the hour, minute, and second of a time.

See Chapter 5, “Creating, Populating, and Deleting Tables” and Chapter 6, “Modifying and Updating Tables and Indexes” for more information about date and time informats and formats.

Character Data

PROC SQL provides tools to manipulate and store character data including words, text, and codes using the CHARACTER or CHAR data type. The characters allowed by this data type include the ASCII or EBCDIC character sets. The CHARACTER or CHAR data type stores fixed-length character strings consisting of a maximum of 32K characters. If a length is not specified, a CHAR column stores a default of 8 characters.

The SQL programmer has a vast array of SQL and Base SAS functions that can make the task of working with character data considerably easier. In this chapter, you’ll learn how columns based on the character data type are defined, and how string functions, pattern matching, phonetic matching techniques, and a variety of other techniques are used with character data.

Missing Values and NULL

Missing values are an important aspect when dealing with data. The concept of missing values is familiar to programmers, statisticians, researchers, and other SAS users. This chapter describes what NULL values are, what they aren’t, and how they are used.

Missing or unknown information is supported by PROC SQL in a form known as a null value. A null value is not the same as a zero value. In SAS, null values are treated as a separate category from known values. A value consisting of zero has a known value. In contrast, a value of null has an unknown quantity and will never be known. For example, a patient who is given an eye exam does not have zero eyesight just because the results from the exam haven’t been received. The correct value to assign in a case like this is a missing or a null value.

In another example, say a person declines to provide their age on a survey. This person’s age is null, not zero. Essentially, this person has an age, but it is unknown. Whenever an unknown value occurs, you have no choice but to assign an unknown value—null.

Because the value of null is unknown, any arithmetic calculation using a null will return a null. This makes a lot of sense because the results of a calculation using a null are not determinable. This is sometimes referred to as the propagation of nulls because when a null value is used in a calculation or an expression, it propagates a null value. For example, if a null is added to a known value, then the result is a null value.

Arithmetic and Missing Data

In SAS, a numeric data type containing a null value (absence of any value) is represented with a period (.). This representation indicates that the column has not been assigned a value. A null value has no value and is not the same as zero. A value consisting of zero has a known quantity as opposed to a null value that is not known and never will be known.

If a null value is multiplied with a known value, the result is a null value represented with a period (.). In the next example, when UNITS and UNITCOST both have known values, their product will generate a known value, as illustrated below.

SQL Code

PROC SQL;

  SELECT CUSTNUM,

         PRODNUM,

         UNITS,

         UNITCOST,

         UNITS * UNITCOST AS Total FORMAT=DOLLAR12.2

    FROM PURCHASES

      ORDER BY Total;

QUIT;

Results

image shown here

image shown here

SQL Keywords

SQL provides three keywords: AS, DISTINCT, and UNIQUE to perform specific operations on the results. Each will be presented in order as follows.

Creating Column Aliases

In situations where data is computed using system functions, statistical functions, or arithmetic operations, a column name or header can be left blank. To prevent this from occurring, users may specify the AS keyword to provide a name to the column or heading itself along with formatting directions. The next example illustrates using the AS keyword to prevent the name for the computed column from being assigned a temporary column name similar to _TEMAxxx. The name assigned with the AS keyword is also used as the column header on output, as shown below.

SQL Code

PROC SQL;

  SELECT PRODNAME,

         PRODTYPE,

         PRODCOST * 0.80 AS Discount_Price FORMAT=DOLLAR9.2

    FROM PRODUCTS

      ORDER BY 3;

QUIT;

Results

image shown here

Finding Duplicate Values

In some situations, several rows in a table may contain identical or duplicate column values. To select only one of each identical or duplicate values, SAS supports and processes the DISTINCT and UNIQUE keywords the same and without any noticeable performance differences.  On another note, the ANSI standards support the DISTINCT keyword as the keyword of choice with SELECT statements enabling greater code portability to other databases. The DISTINCT keyword can be used in the SELECT statement as follows.

SQL Code

PROC SQL;

  SELECT DISTINCT MANUNUM

    FROM INVENTORY;

QUIT;

Results

image shown here

Finding Unique Values

In some situations, several rows in a table will contain identical column values. To select each of these duplicate values only once, the UNIQUE keyword can be used in the SELECT statement.

SQL Code

PROC SQL;

  SELECT UNIQUE MANUNUM

    FROM INVENTORY;

QUIT;

Results

image shown here

SQL Operators, Functions, and Keywords

SQL programmers have a number of ways to accomplish their objectives, particularly when the goal is to retrieve and work with data. The SELECT statement is an extremely powerful statement in the SQL language. Its syntax can be somewhat complex because of the number of ways that columns, tables, operators, functions, and predicates can be combined into executable statements.

There are several types of operators and functions in PROC SQL:

         comparison operators

         logical operators

         arithmetic operators

         character string operators

         summary functions

         predicates

         keywords

Operators and functions provide value-added features for PROC SQL programmers. Each will be presented below.

Comparison Operators

Comparison operators are used in the SQL procedure to compare one character or numeric value to another. As in the DATA step, SQL comparison operators, mnemonics, and their descriptions appear in the following table.

SAS Operator

Mnemonic Operators

Description

=

EQ

Equal to

^=  or  ¬=

NE

Not equal to

LT

Less than

<=

LE

Less than or equal to

GT

Greater than

>=

GE

Greater than or equal to

Suppose you want to select only those products from the PRODUCTS table that cost more than $300.00. The example illustrates the use of the greater than sign (>) in a WHERE clause to select products meeting the condition.

SQL Code

PROC SQL;

  SELECT PRODNAME,

         PRODTYPE,

         PRODCOST

    FROM PRODUCTS

      WHERE PRODCOST > 300;

QUIT;

Results

image shown here

PROC SQL also supports the use of truncated string comparison operators. These operators work by first truncating the longer string to the same length as the shorter string, and then perform the specified

 

comparison. The result of using any of the comparison operators has no permanent affect on the strings themselves. The list of truncated string comparison operators and their meanings appear below.

Truncated String Comparison Operator

Description

EQT

Equal to

GTT

Greater than

LTT

Less than

GET

Greater than or equal to

LET

Less than or equal to

NET

Not equal to

Logical Operators

Logical operators are used to connect two or more expressions together in a WHERE or HAVING clause. The available logical operators consist of AND, OR, and NOT. Suppose you want to select only those software products that cost more than $300.00. The example illustrates how the AND operator is used to ensure that both conditions are true.

SQL Code

PROC SQL;

  SELECT PRODNAME,

         PRODTYPE,

         PRODCOST

    FROM PRODUCTS

      WHERE PRODTYPE = ‘Software’ AND

            PRODCOST > 300;

QUIT;

Results

image shown here

The next example illustrates the use of the OR logical operator to select software products or products that cost more than $300.00.

SQL Code

PROC SQL;

  SELECT PRODNAME,

         PRODTYPE,

         PRODCOST

    FROM PRODUCTS

      WHERE PRODTYPE = ‘Software’ OR

            PRODCOST > 300;

QUIT;

Results

image shown here

The next example illustrates the use of the NOT logical operator to select products that are not software products and do not cost more than $300.00. Should PRODTYPE contain any value other than “Software,” including a null value, the resulting output would include the row.

SQL Code

PROC SQL;

  SELECT PRODNAME,

         PRODTYPE,

         PRODCOST

    FROM PRODUCTS

      WHERE NOT PRODTYPE = ‘Software’ AND

            NOT PRODCOST > 300;

QUIT;

Results

image shown here

Arithmetic Operators

The arithmetic operators used in PROC SQL are the same operators that are used in the DATA step as well as those found in other languages such as C, Pascal, FORTRAN, and COBOL. The arithmetic operators available in the SQL procedure appear below.

Operator

Description

+

Addition

-

Subtraction

*

Multiplication

/

Division

**

Exponent (raises to a power)

=

Equals

To illustrate how arithmetic operators are used, suppose you want to apply a discount of 20% to the product price (PRODCOST) in the PRODUCTS table. Note that the computed column (PRODCOST * 0.80) does not automatically create a column header on output.

SQL Code

PROC SQL;

  SELECT PRODNAME,

         PRODTYPE,

         PRODCOST * 0.80

    FROM PRODUCTS;

QUIT;

Results

image shown here

In the next example, suppose you wanted to reference a column that was calculated in the SELECT statement. PROC SQL allows references to a computed column in the same SELECT statement (or a WHERE clause) using the CALCULATED keyword. Note that the computed columns have column aliases created for them using the AS keyword. If the CALCULATED keyword were not specified preceding the calculated column, an error would have been generated.  The results were produced in ascending order by the discounted price.

SQL Code

PROC SQL;

  SELECT PRODNAME,

         PRODTYPE,

         PRODCOST * 0.80 AS DISCOUNT_PRICE

                  FORMAT=DOLLAR9.2,

 

         PRODCOST – CALCULATED DISCOUNT_PRICE AS LOSS

                  FORMAT=DOLLAR7.2

    FROM PRODUCTS

      ORDER BY 3;

QUIT;

Results

image shown here

Character String Operators and Functions

Character string operators and functions are typically used with character data. Numerous operators are presented to acquaint programmers with the power available with in SQL procedure. As you become familiar with each operator, you’ll find their real strength as you begin to nest functions within each other.

Concatenating Strings

The following example illustrates a basic concatenation operator that is used to concatenate two columns and a text string. Note that the created column is without a name and has a total length of 23 characters. For more details and special formatting considerations, the concatenation operator “||” will be discussed in greater detail in Chapter 3, “Formatting Output.”

SQL Code

PROC SQL;

  SELECT MANUCITY || “,” || MANUSTAT

    FROM MANUFACTURERS;

QUIT;

 

Results

image shown here

Two other effective methods of concatenating columns and/or text strings in SQL operations is to use the special concatenation functions, CAT or CATS. The next example illustrates a CAT function being used to concatenate two columns and a text string. The CAT function does not remove leading and trailing blanks, and returns a maximum value of 200 characters in a concatenated character string in PROC SQL.

SQL Code

 PROC SQL;

   SELECT CAT(MANUCITY,“,”, MANUSTAT)

     FROM MANUFACTURERS;

 QUIT;

Results

image shown here

The CATS function can also be used in PROC SQL. Its purpose is to remove leading and trailing blanks, and return a maximum value of 200 characters in a concatenated character string in PROC SQL.

 

Finding the Length of a String

The LENGTH function is used to obtain the length of a character string column. LENGTH returns a number equal to the number of characters in the argument. Note that the computed column (LENGTH(PRODNAME)) has a column header created for it called Length by specifying the AS keyword. This example illustrates using the LENGTH function to determine the length of data values.

SQL Code

PROC SQL;

  SELECT PRODNUM,

         PRODNAME,

         LENGTH(PRODNAME) AS Length

    FROM PRODUCTS;

QUIT;

Results

image shown here

Combining Functions and Operators

As in the DATA step, many functions can be used in the SQL procedure. To modify one or more existing rows in a table, the UPDATE statement is used (see Chapter 6, “Modifying and Updating Tables and Indexes,” for more details). The UPDATE statement with the SET clause changes the contents of a data value (functioning the same way as a DATA step assignment statement) by assigning a new value to the column identified to the left of the equal sign by a constant or expression referenced to the right of the equal sign.

The UPDATE statement does not automatically produce any output except for the log messages that are based on the operation results itself. To illustrate the use of DATA step functions and operators in the SQL procedure, the next example shows a SCAN function that isolates the first piece of information from product name (PRODNAME), a TRIM function to remove trailing blanks from product type (PRODTYPE), and a concatenation operator “||” that concatenates the first character expression with the second expression. It should be noted that care should be exercised when using the SCAN function because it returns a 200-byte string.

SQL Code

PROC SQL;

  UPDATE PRODUCTS

    SET PRODNAME = SCAN(PRODNAME,1) || TRIM(PRODTYPE);

QUIT;

SAS Log Results

     PROC SQL;

       UPDATE PRODUCTS

         SET PRODNAME = SCAN(PRODNAME,1) || TRIM(PRODTYPE);

NOTE: 10 rows were updated in PRODUCTS.

     QUIT;

An optional WHERE clause can be specified to limit the number of rows to which modifications will be applied. The next example illustrates using a WHERE clause to restrict the number of rows that are updated in the previous example to just “phone,” excluding all the other rows.

SQL Code

PROC SQL;

  UPDATE PRODUCTS

    SET PRODNAME = SCAN(PRODNAME,1) || TRIM(PRODTYPE)

      WHERE PRODTYPE IN (‘Phone’);

QUIT;

SAS Log Results

     PROC SQL;

       UPDATE PRODUCTS

         SET PRODNAME = SCAN(PRODNAME,1) || TRIM(PRODTYPE)

           WHERE PRODTYPE IN (‘Phone’);

NOTE: 3 rows were updated in PRODUCTS.

     QUIT;

Aligning Characters

The default alignment for character data is to the left; however, character columns or expressions can also be aligned to the right. Two functions are available for character alignment: LEFT and RIGHT. The next example combines the concatenation operator “||” and the TRIM function with the LEFT function to left align a character expression while inserting a comma “,” and blank between the columns.

SQL Code

PROC SQL;

  SELECT LEFT(TRIM(MANUCITY) || “, ” || MANUSTAT)

    FROM MANUFACTURERS;

QUIT;

Results

image shown here

The next example illustrates how character data can be right aligned using the RIGHT function.

SQL Code

PROC SQL;

  SELECT RIGHT(MANUCITY)

    FROM MANUFACTURERS;

QUIT;

Results

image shown here

Finding the Occurrence of a Pattern with INDEX

To find the occurrence of a pattern, the INDEX function can be used. Frequently, requirements call for a column to be searched using a specific character string. The INDEX function can be used in the SQL procedure to search for patterns in a character string. The character string is searched from left to right for the first occurrence of the specified value. If the desired string is found, the column position of the first character is returned. Otherwise, a value of zero (0) is returned. The following arguments are used to search for patterns in a column: the character column or expression, and the character string to search for. To find all products with the characters “phone” in the product name (PRODNAME) column, the following code can be specified:

 

SQL Code

PROC SQL;

  SELECT PRODNUM,

         PRODNAME,

         PRODTYPE

    FROM PRODUCTS

      WHERE INDEX(PRODNAME, 'phone') > 0;

QUIT;

SAS Log Results

     PROC SQL;

       SELECT PRODNUM,

              PRODNAME,

              PRODTYPE

         FROM PRODUCTS

           WHERE INDEX(PRODNAME, 'phone') > 0;

NOTE: No rows were selected.

     QUIT;

Analysis

As in the DATA step, no rows were selected because the search is case sensitive and “phone” is specified as all lowercase characters.

Changing the Case in a String

SAS provides two functions that enable you to change the case of a string’s characters: LOWCASE and UPCASE. The LOWCASE function converts all of the characters in a string or expression to lowercase characters. The UPCASE function converts all of the characters in a string or expression to uppercase characters.

Getting back to the previous example, the results of the search were negative even though the character string “phone” appeared multiple times in more than one row. In order to make this search recognize all the possible lower- and uppercase variations of the word “phone,” the search criteria in the WHERE clause could be made “smarter” by combining an UPCASE function with the INDEX function as follows.

SQL Code

PROC SQL;

  SELECT PRODNUM,

         PRODNAME,

         PRODTYPE

    FROM PRODUCTS

      WHERE INDEX(UPCASE(PRODNAME), 'PHONE') > 0;

QUIT;

Results

image shown here

In the next example, the LOWCASE function is combined with the INDEX function to produce the identical output from the previous example.

SQL Code

PROC SQL;

  SELECT PRODNUM,

         PRODNAME,

         PRODTYPE

    FROM PRODUCTS

      WHERE INDEX(LOWCASE(PRODNAME), 'phone') > 0;

QUIT;

Results

image shown here

Extracting Information from a String

Occasionally, processing requirements call for specific pieces of information to be extracted from a column. In these situations the SUBSTR function can be used with a character column by specifying a starting position and the number of characters to extract. The following example illustrates how the SUBSTR function is used to capture the first 4 bytes from the product type (PRODTYPE) column.

SQL Code

PROC SQL;

  SELECT PRODNUM,

         PRODNAME,

         PRODTYPE,

         SUBSTR(PRODTYPE,1,4)

    FROM PRODUCTS

      WHERE PRODCOST > 100.00;

QUIT;

Results

image shown here

Phonetic Matching (Sounds-Like Operator =*)

A technique for finding names that sound alike or have spelling variations is available in the SQL procedure. This frequently used technique is referred to as phonetic matching and is performed using the Soundex algorithm. In Joe Celko’s popular book SQL for Smarties: Advanced SQL Programming,  (Morgan Kaufman, 2014), he traced the origins of the Soundex algorithm to the developers Margaret O’Dell and Robert C. Russell in 1918. Developed before the first computer, clerks often used the algorithm to manually search for similar sounding names.

Although not technically a function, the sounds-like operator “=*” searches and selects character data based on two expressions: the search value and the matched value. Anyone that has looked for a last name in a local telephone directory is quickly reminded of the possible phonetic variations.

To illustrate how the sounds-like operator works, we will search on CUSTNAME in the CUSTOMERS2 table. The CUSTOMERS2 table is illustrated below. Although each name has phonetic variations and sounds the same, the results of “Laughler,” “Loffler,” and “Laffler” are spelled differently (illustrated below). The following PROC SQL code uses the sounds-like operator to find all customers that sound like “Lafler.”

CUSTOMERS2 Table

CUSTNUM    CUSTNAME    CUSTCITY

    1      Smith       San Diego

    7      Lafler      Spring Valley

   11      Jones       Carmel

   13      Thompson    Miami

    7      Loffler     Spring Valley

    1      Smithe      San Diego

    7      Laughler    Spring Valley

    7      Laffler     Spring Valley

 

SQL Code

PROC SQL;

  SELECT CUSTNUM,

         CUSTNAME,

         CUSTCITY

    FROM CUSTOMERS2

      WHERE CUSTNAME =* 'Lafler';

QUIT;

Results

image shown here

Readers familiar with the DATA step SOUNDEX(<argument>) function to search a string are cautioned that it cannot be used in an SQL WHERE clause. Instead, the sounds-like operator “=*” must be specified; otherwise, a result of no rows will be selected.

Notice that only three of the four possible phonetic matches were selected in the preceding example (i.e., Lafler, Loffler, and Laffler). The fourth possibility, “Laughler” was not chosen as a “matched” value in the search by the sounds‑like algorithm. In an attempt to overcome the inherent limitation with the sounds-like operator, as described in Joe Celko’s popular book SQL for Smarties: Advanced SQL Programming, (Morgan Kaufman, 2014), and to derive a broader list of “matched” values, programmers should make every attempt to develop a comprehensive list of search values to widen the scope of possibilities. We can expand our original search criteria in the previous example to include the missing possibilities using OR logic.

SQL Code

PROC SQL;

  SELECT CUSTNUM,

         CUSTNAME,

         CUSTCITY

    FROM CUSTOMERS2

      WHERE CUSTNAME =* 'Lafler'   OR

            CUSTNAME =* 'Laughler' OR

            CUSTNAME =* 'Lasler';

QUIT;

Results

image shown here

Finding the First Nonmissing Value

The first example provides a way to find the first non-missing value in a column or list. Specified in a SELECT statement, the COALESCE function inspects a column, or in the case of a list scans the arguments from left to right, and returns the first non-missing or non-null value. If all values are missing, the result is missing. To take advantage of the COALESCE function, all arguments must be of the same data type. The next example illustrates one approach on computing the total cost for each product purchased from the number of units and unit costs columns in the PURCHASES table. If either the UNITS column or the UNITCOST column contains a missing value, a zero is assigned by the programmer to prevent the propagation of missing values.

SQL Code

PROC SQL;

  SELECT CUSTNUM,

         PRODNUM,

         UNITS,

         UNITCOST,

         (COALESCE(UNITS, 0) * COALESCE(UNITCOST, 0))

               AS TOTCOST FORMAT=DOLLAR10.2

    FROM PURCHASES;

QUIT;

Results

image shown here

image shown here

Producing a Row Number

A unique undocumented, unsupported feature for producing a row (observation) count can be obtained with the MONOTONIC( ) function. Similar to the row numbers produced and displayed in output from the PRINT procedure (without the NOOBS option specified), the MONOTONIC() function displays row numbers, too. The MONOTONIC() function automatically creates a column (variable) in the output results or in a new table. Because this is an undocumented feature and is not supported in the SQL procedure, users are cautioned to use care when using the MONOTONIC() function because it is possible to obtain duplicate or missing values. The next example illustrates the creation of a row number using the MONOTONIC() function in a SELECT statement.

SQL Code

PROC SQL;

  SELECT MONOTONIC() AS Row_Number FORMAT=COMMA6.,

         PRODNUM,

         UNITS,

         UNITCOST

    FROM PURCHASES;

QUIT;

Results

image shown here

image shown here

A row number can also be produced with the documented and supported SQL procedure option, NUMBER. Unlike the MONOTONIC() function, the NUMBER option does not create a new column in a new table. The NUMBER option is illustrated below.

SQL Code

PROC SQL NUMBER;

  SELECT PRODNUM,

         UNITS,

         UNITCOST

    FROM PURCHASES;

QUIT;

Results

image shown here

image shown here

image shown here

Summarizing Data

The SQL procedure is a wonderful tool for summarizing (or aggregating) data. It provides a number of useful summary (or aggregate) functions to help perform calculations, descriptive statistics, and other aggregating operations in a SELECT statement or HAVING clause. These functions are designed to summarize information and not display detail about data.

Without the availability of summary functions, you would have to construct the necessary logic using somewhat complicated SQL programming constructs. When using a summary function without a GROUP BY clause, (see Chapter 3, “Formatting Output”), all the rows in a table are treated as a single group. Consequently, the results are often a single row value.

A number of summary functions are available including facilities to count non‑missing values; determine the minimum and maximum values in specific columns; return the range of values; compute the mean, standard deviation, and variance of specific values; and other aggregating functions. In the following table, an alphabetical listing of the available summary functions is displayed and, when multiple names for the same function are available, the ANSI‑approved name appears first.

Summary Function

Description

AVG, MEAN

Average or mean of values

COUNT, FREQ, N

Aggregate number of non-missing values

CSS

Corrected sum of squares

CV

Coefficient of variation

MAX

Largest value

MIN

Smallest value

NMISS

Number of missing values

PRT

Probability of a greater absolute value of Student’s t

RANGE

Difference between the largest and smallest values

STD

Standard deviation

STDERR

Standard error of the mean

SUM

Sum of values

SUMWGT

Sum of the weight variable values which is 1

T

Testing the hypothesis that the population mean is zero

USS

Uncorrected sum of squares

VAR

Variance

The next example uses the COUNT function with the (*) argument to produce a total number of rows, whether data is missing or not. The asterisk (*) is specified as the argument to the COUNT function to count all rows in the PURCHASES table.

SQL Code

PROC SQL;

  SELECT COUNT(*) AS Row_Count

    FROM PURCHASES;

QUIT;

Results

image shown here

Unlike the COUNT(*) function syntax that counts all rows, whether data is missing or not, the next example uses the COUNT function with the (column‑name) argument to produce a total number of non-missing rows based on the UNITS column.

SQL Code

PROC SQL;

  SELECT COUNT(UNITS) AS Non_Missing_Row_Count

    FROM PURCHASES;

QUIT;

Results

image shown here

The MIN summary function can be specified to determine what the least expensive product is in the PRODUCTS table.

SQL Code

PROC SQL;

  SELECT MIN(prodcost) AS Cheapest

            Format=dollar9.2 Label=’Least Expensive’

    FROM PRODUCTS;

QUIT;

Results

image shown here

In the next example, the SUM function is specified to sum numeric data types for a selected column. Suppose you want to determine the total costs of all purchases by customers who bought workstations (PRODNUM=1110 and 1200) and laptops (PRODNUM=1700). You could construct the following query to sum all non-missing values for customers who purchased workstations and laptops in the PURCHASES table.

SQL Code

PROC SQL;

  SELECT SUM((UNITS) * (UNITCOST))

               AS Total_Purchases FORMAT=DOLLAR12.2

    FROM PURCHASES

      WHERE PRODNUM = 1110 OR

            PRODNUM = 1200 OR

            PRODNUM = 1700;

QUIT;

Results

image shown here

Data can also be summarized down rows (observations) as well as across columns (variables). This flexibility gives SAS users an incredible range of power, and the ability to take advantage of several summary functions that are supplied (or built-in) by SAS. These techniques permit the average of quantities rather than the set of all quantities. Without the ability to summarize data in PROC SQL, users would be forced to write complicated formulas and/or routines, or even write and test DATA step programs to summarize data. Two examples will be illustrated to show how SQL can be constructed to summarize data:

         Summarizing data down rows

         Summarizing data across columns

Summarizing Data Down Rows

The SQL procedure can be used to produce a single aggregate value by summarizing data down rows (or observations). The advantages of using a summary function in PROC SQL is that it will generally compute the aggregate quicker than if a user-defined equation were constructed, and it saves the effort of having to construct and test a program containing the user-defined equation in the first place. Suppose you want to know the average product cost for all software in the PRODUCTS table containing a variety of products. The following query computes the average product cost and produces a single aggregate value using the AVG function.

SQL Code

PROC SQL;

 SELECT AVG(PRODCOST) AS

      AVERAGE_PRODUCT_COST FORMAT=DOLLAR10.2

  FROM PRODUCTS

   WHERE UPCASE(PRODTYPE) IN

         (“SOFTWARE”);

QUIT;

Results

image shown here

Summarizing Data Across Columns

When a computation is needed on two or more columns in a row, the SQL procedure can be used to summarize data across columns. Suppose you want to know the average cost of products in inventory. The next example computes the average inventory cost for each product without using a summary function, and once computed displays the value for each row as AVERAGE_PRICE.

SQL Code

PROC SQL;

 SELECT PRODNUM,

        (INVPRICE / INVQTY) AS

          AVERAGE_PRICE

            FORMAT=DOLLAR8.2

  FROM INVOICE;

QUIT;

Results

image shown here

Predicates

Predicates are used in PROC SQL to perform direct comparisons between two conditions or expressions. Six predicates will be looked at:

         BETWEEN

         IN

         IS NULL

         IS MISSING

         LIKE

         EXISTS

Selecting a Range of Values

The BETWEEN predicate is a way of simplifying a query by selecting column values within a designated range of values. BETWEEN is equivalent to an AND combination of one LE (less than or equal) and one GE (greater than or equal) conditions; inclusive of both endpoints. It is extremely flexible because it works with character, numeric, and date values. Programmers can also combine two or more BETWEEN predicates with AND or OR operators for more complicated conditions. In the next example, a range of products costing between $200.00 and $500.00 inclusively are selected from the PRODUCTS table.

SQL Code

PROC SQL;

  SELECT PRODNAME,

         PRODTYPE,

         PRODCOST

    FROM PRODUCTS

      WHERE PRODCOST BETWEEN 200 AND 500;

QUIT;

Results

image shown here

In the next example, products are selected from the INVENTORY table that were ordered between the years 1999 and 2000. The YEAR function returns the year portion from a SAS date value and is used as the range of values in the WHERE clause.

SQL Code

PROC SQL;

  SELECT PRODNUM,

         INVENQTY,

         ORDDATE

    FROM INVENTORY

      WHERE YEAR(ORDDATE) BETWEEN 1999 AND 2000;

QUIT;

Results

image shown here

The BETWEEN predicate and OR operator are used together in the next example to select products ordered between 1999 and 2000 or where inventory quantities are greater than 15. The YEAR function returns the year portion from a SAS date value and is used as the range of values in the WHERE clause.

SQL Code

PROC SQL;

  SELECT PRODNUM,

         INVENQTY,

         ORDDATE

    FROM INVENTORY

 

      WHERE (YEAR(ORDDATE) BETWEEN 1999 AND 2000) OR

             INVENQTY > 15;

QUIT;

Results

image shown here

Selecting Nonconsecutive Values

The IN predicate selects one or more rows based on the matching of one or more column values in a set of values. The IN predicate creates an OR condition between each value and returns a Boolean value of True if a column value is equal to one or more of the values in the expression list. Although the IN predicate can be specified with single column values, it may be less costly to specify the “=” sign instead. The “=” sign is used in the next example rather than the IN predicate to select phones from the PRODUCTS table.

SQL Code

PROC SQL;

  SELECT PRODNAME,

         PRODTYPE,

         PRODCOST

    FROM PRODUCTS

      WHERE UPCASE(PRODTYPE) = ‘PHONE’;

QUIT;

Results

image shown here

In the next example, both phones and software products are selected from the PRODUCTS table. To avoid having to specify two OR conditions, the IN predicate is specified.

SQL Code

PROC SQL;

  SELECT PRODNAME,

         PRODTYPE,

         PRODCOST

    FROM PRODUCTS

      WHERE UPCASE(PRODTYPE) IN (‘PHONE’, ‘SOFTWARE’);

QUIT;

Results

image shown here

Testing for NULL or MISSING Values

The IS NULL predicate is the ANSI approach of selecting one or more rows by evaluating whether a column value is missing or null (see the “Missing Values and Null section). The next example selects products from the INVENTORY table that are out-of-stock in inventory.

SQL Code

PROC SQL;

  SELECT PRODNUM,

         INVENQTY,

         INVENCST

    FROM INVENTORY

      WHERE INVENQTY IS NULL;

QUIT;

SAS Log Results

     PROC SQL;

       SELECT PRODNUM,

              INVENQTY,

              INVENCST

         FROM INVENTORY

           WHERE INVENQTY IS NULL;

NOTE: No rows were selected.

     QUIT;

NOTE: PROCEDURE SQL used:

      real time           0.05 seconds

The next example selects products from the INVENTORY table that are currently stocked in inventory. Note that the predicates NOT IS NULL or IS NOT NULL can be specified to produce the same results.

SQL Code

PROC SQL;

  SELECT PRODNUM,

         INVENQTY,

         INVENCST

    FROM INVENTORY

      WHERE INVENQTY IS NOT NULL;

QUIT;

Results

image shown here

The IS MISSING predicate performs identically to the IS NULL predicate by selecting one or more rows containing a missing value (null). The only difference is that specifying IS NULL is the ANSI standard way of expressing the predicate and IS MISSING is commonly used in SAS. The next example uses the IS MISSING with the NOT predicate to select products from the INVENTORY table that are stocked in inventory.

SQL Code

PROC SQL;

  SELECT PRODNUM,

         INVENQTY,

         INVENCST

    FROM INVENTORY

      WHERE INVENQTY IS NOT MISSING;

QUIT;

Finding Patterns in a String (Pattern Matching % and _)

Constructing specific search patterns in string expressions is a simple process with the LIKE predicate. The % (percent sign) acts as a wildcard character representing any number of characters, including any combination of upper or lower case characters. Combining the LIKE predicate with the % permits case‑sensitive searches and is a popular technique used by savvy SQL programmers to find patterns in their data.

Using the LIKE operator with the % provides a wildcard capability enabling the selection of table rows that match a specific pattern. The LIKE predicate is case‑sensitive and should be used with care. To find patterns in product name (PRODNAME) containing the uppercase character “A” in the first position followed by any number of characters is specified with the following WHERE clause.

SQL Code

PROC SQL;

  SELECT PRODNAME

    FROM PRODUCTS

      WHERE PRODNAME LIKE ‘A%’;

QUIT;

Results

image shown here

The next example illustrates the wildcard character “%” preceding and following the search word to select all products whose name contains the word “Soft” in its name. The resulting output contains product types such as “Software” and any other products containing the word “Soft”.

SQL Code

PROC SQL;

  SELECT PRODNAME,

         PRODTYPE,

         PRODCOST

    FROM PRODUCTS

      WHERE PRODTYPE LIKE ‘%Soft%’;

QUIT;

Results

image shown here

In the next example, the LIKE predicate is used to check a column for the existence of trailing blanks. The wildcard character % followed by a blank space is specified as the search argument.

SQL Code

PROC SQL;

  SELECT PRODNAME

    FROM PRODUCTS

      WHERE PRODNAME LIKE  ‘% ’;

QUIT;

Results

image shown here

When a pattern search for a specific number of characters is needed, using the LIKE predicate with the underscore (_) provides a way to pattern match character-by-character. Thus, a single underscore (_) in a specific position acts as a wildcard placement holder for that position only. Two consecutive underscores (__) act as a wildcard placement holder for those two positions. Three consecutive underscores act as a wildcard placement holder for those three positions. And so forth. In the next example, the first position used to search product type contains the character “P”, and the next five positions (represented with five underscores) act as a placeholder for any value.

SQL Code

PROC SQL;

   SELECT PRODNAME,

          PRODTYPE,

          PRODCOST

     FROM PRODUCTS

       WHERE UPCASE(PRODTYPE) LIKE ‘P_____’;

QUIT;

Results

image shown here

The next example illustrates a pattern search of product name (PRODNAME) where the first three positions are represented as a wildcard; the fourth position contains the lowercase character “a”, followed by any combination of uppercase or lowercase characters.

 

SQL Code

PROC SQL;

  SELECT PRODNAME

    FROM PRODUCTS

      WHERE PRODNAME LIKE  ‘___a%’;

QUIT;

Results

image shown here

Testing for the Existence of a Value

The EXISTS predicate is used to test for the existence of a set of values. In the next example, a subquery is used to check for the existence of customers in the CUSTOMERS table with purchases from the PURCHASES table. More details on subqueries will be presented in Chapter 7, “Coding Complex Queries.”

SQL Code

PROC SQL;

  SELECT CUSTNUM,

         CUSTNAME,

         CUSTCITY

    FROM CUSTOMERS C

      WHERE EXISTS

        (SELECT *

          FROM PURCHASES P

            WHERE C.CUSTNUM = P.CUSTNUM);

QUIT;

Results

image shown here

CALCULATED Keyword

As described earlier, the SELECT statement’s purpose is to read data from one or more tables (or data sets). In addition to selecting columns that are stored in one or more tables, SQL can be used to dynamically create new columns containing text or computations, so they exist for the duration of the query. New columns can be created in a SELECT statement by specifying a column alias with an ‘AS’ keyword. The column name must adhere to the rules for SAS names and persists for that query only. In the next example, a name is assigned to the results of a new computed column (UNITCOST * UNITS) called, TOTALCOST, using the AS keyword and is formatted with a DOLLAR10.2 format for each row in the PURCHASES table.

SQL Code

PROC SQL;

  SELECT CUSTNUM

        ,UNITCOST

        ,UNITS

        ,UNITCOST * UNITS AS TOTALCOST FORMAT=DOLLAR10.2

    FROM PURCHASES;

QUIT;

Partial Results

image shown here

.   .   .                               .   .   .

image shown here

When an alias is specified to name a new column, the alias referencing the column can be used in the query. In the next example, a name and format is assigned to the results of the computed column, TOTALCOST, for each row in the PURCHASES table along with a WHERE clause to reference the computed column and to subset the results to display rows where the TOTALCOST is greater than $10,000.

SQL Code

PROC SQL;

  SELECT CUSTNUM

        ,UNITCOST

        ,UNITS

        ,UNITCOST * UNITS AS TOTALCOST FORMAT=DOLLAR10.2

    FROM PURCHASES

      WHERE TOTALCOST > 10000;

QUIT;

In the previous example you learned that we can perform computations in a SELECT statement and assign an alias to any new columns that we create. So why did SAS stop processing our query in this example and send an ERROR message to the SAS log? Earlier in this chapter, we examined the order of execution of each clause in an SQL query and learned that SAS processes the WHERE clause prior to the SELECT clause. Consequently, an error is produced, as is shown below, if the computed column is used in a WHERE clause as a condition because the WHERE clause is unaware that the computed column, TOTALCOST, exists.

SAS Log Results

        PROC SQL;

          SELECT CUSTNUM

                ,UNITCOST

                ,UNITS

                ,UNITCOST * UNITS AS TOTALCOST FORMAT=DOLLAR10.2

            FROM PURCHASES

              WHERE TOTALCOST > 10000;

 ERROR: The following columns were not found in the contributing tables: TOTALCOST.

 NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.

        QUIT;

 NOTE: The SAS System stopped processing this step because of errors.

To correct the problem illustrated in the previous example, the keyword CALCULATED needs to be specified in the WHERE clause along with the alias to inform SAS that the new column is derived within the query. In the next example, a name and format is assigned to the computed column, TOTALCOST, for each row in the PURCHASES table along with a WHERE clause specifying the CALCULATED keyword followed by the alias to reference, subset and display the computed column’s results where the TOTALCOST is greater than $10,000.

SQL Code

PROC SQL;

  SELECT CUSTNUM

        ,UNITCOST

        ,UNITS

        ,UNITCOST * UNITS AS TOTALCOST FORMAT=DOLLAR10.2

    FROM PURCHASES

      WHERE CALCULATED TOTALCOST > 10000;

QUIT;

Results

image shown here

Dictionary Tables

SAS generates and maintains valuable runtime information (metadata) about SAS libraries, data sets, catalogs, indexes, macros, system options, titles, views, and other content in a collection of read-only tables called dictionary tables. Although called tables, dictionary tables are not real tables at all. Dictionary tables and their contents permit a SAS session’s activities to be easily accessed and monitored. Information is automatically produced and each table’s contents are made available at the time a SAS session begins. Table 2.2 presents the available Dictionary tables and a description of their contents.

Table 2.2: Available Dictionary Tables and Their Contents

Dictionary Table Name

Contents

DICTIONARY.CATALOGS

SAS catalogs

DICTIONARY.COLUMNS

Data set columns and attributes

DICTIONARY.EXTFILES

Allocated filerefs and external physical path

DICTIONARY.INDEXES

Data set indexes

DICTIONARY.MACROS

Global and automatic macro variables

DICTIONARY.MEMBERS

SAS data sets and other member types

DICTIONARY.OPTIONS

Current SAS system option settings

DICTIONARY.TABLES

SAS data sets and views

DICTIONARY.TITLES

Title and footnote definitions

DICTIONARY.VIEWS

SAS data views

Dictionary Tables and Metadata

SAS collects and populates valuable information (“metadata” or data about data) on SAS libraries, data sets (tables), catalogs, indexes, macros, system options, titles, views, and a collection of other read-only tables called dictionary tables. Dictionary tables serve a special purpose by providing system-related information about the current SAS session’s SAS databases and applications. When a query is requested against a dictionary table, SAS automatically launches a discovery process at runtime to collect information pertinent to that table. This information is made available any time after a SAS session is started.

While SAS 9.1 has 22 dictionary tables and SASHELP views, there are 29 “known” dictionary tables and SASHELP views in SAS 9.2, 30 “known” dictionary tables and SASHELP views in SAS 9.3, and 32 “known” dictionary tables and SASHELP views in SAS 9.4. The names of each DICTIONARY table and SASHELP view are illustrated in Table 2.3 below.

Table 2.3: DICTIONARY Tables and SASHELP Views

DICTIONARY Table

SASHELP View

Purpose

CATALOGS

VCATALG

Provides information about SAS catalogs.

CHECK_CONSTRAINTS

VCHKCON

Provides check constraints information.

COLUMNS

VCOLUMN

Provides information about column in tables.

CONSTRAINT_COLUMN_USAGE

VCNCOLU

Provides column integrity constraints information.

CONSTRAINT_TABLE_USAGE

VCNTABU

Provides information related to tables with integrity constraints defined.

DATAITEMS

VDATAIT

Provides information about known data items.

DESTINATIONS

VDEST

Provides information about known ODS destinations.

DICTIONARIES

VDCTNRY

Provides information about all the DICTIONARY tables.

ENGINES

VENGINE

Provides information about known SAS engines available to the session.

EXTFILES

VEXTFL

Provides information related to external files.

FILTERS

VFILTER

Provides information about known filters.

FORMATS

VFORMAT

Provides information related to defined formats and informats.

FUNCTIONS

VFUNC

Provides information about all known functions.

GOPTIONS

VGOPT

Provides information about currently defined SAS/GRAPH software graphics options.

INDEXES

VINDEX

Provides information related to defined indexes.

INFOMAPS

VINFOMP

Provides information about all known information maps.

LIBNAMES

VLIBNAM

Provides information related to defined SAS libraries.

MACROS

VMACRO

Provides information related to any defined macros.

MEMBERS

VMEMBER

Provides information related to objects currently defined in SAS libraries.

OPTIONS

VOPTION

Provides information related to SAS system options.

DICTIONARY Table

SASHELP View

Purpose

PROMPTS

VPROMPT

Provides information about all known SAS/GRAPH prompts.

PROMPTSXML

VPRMXML

Provides information about all known XML prompts.

REFERENTIAL_CONSTRAINTS

VREFCON

Provides information related to tables with referential constraints.

REMEMBER

VREMEMB

Provides information about all known remembered text.

STYLES

VSTYLE

Provides information related to select ODS styles.

TABLES

VTABLE

Provides information related to currently defined tables.

TABLE_CONSTRAINTS

VTABCON

Provides information related to tables containing integrity constraints.

TITLES

VTITLE

Provides information related to currently defined titles and footnotes.

VIEWS

VVIEW

Provides information related to currently defined data views.

Displaying Dictionary Table Definitions

You can view a dictionary table’s definition and enhance your understanding of each table’s contents by specifying a DESCRIBE TABLE statement. The results of the statements used to create each dictionary table can be displayed in the SAS log. For example, a DESCRIBE TABLE statement is illustrated below to display the CREATE TABLE statement used in building the OPTIONS dictionary table containing current SAS system option settings.

SQL Code

PROC SQL;

  DESCRIBE TABLE

    DICTIONARY.OPTIONS;

QUIT;

SAS Log Results

create table DICTIONARY.OPTIONS

  (

   optname char(32) label='Option Name',

   setting char(1024) label='Option Setting',

   optdesc char(160) label='Option Description',

   level char(8) label='Option Location'

  );

Note: The information contained in dictionary tables is also available to DATA and PROC steps outside the SQL procedure. Referred to as dictionary views, each view is prefaced with the letter “V” and may be shortened with abbreviated names. Dictionary view can be accessed by referencing the view by its name in the SASHELP library. See the SAS Procedures Guide for further details on accessing and using dictionary views in the SASHELP library.

Dictionary Table Column Names

To help become familiar with each dictionary table’s and dictionary view’s column names and their definitions, Tables 2.4 through 2.13 identify each unique column name, type, length, format, informat, and label.

Table 2.4: DICTIONARY.CATALOGS or SASHELP.VCATALG

Column

Type

Length

Format

Informat

Label

Libname

char

8

 

 

Library Name

Memname

char

32

 

 

Member Name

Memtype

char

8

 

 

Member Type

Objname

char

32

 

 

Object Name

Objtype

char

8

 

 

Object Type

Objdesc

char

256

 

 

Description

Created

num

 

DATETIME.

DATETIME.

Date Created

Modified

num

 

DATETIME.

DATETIME.

Date Modified

Alias

char

8

 

 

Object Alias

 

 

Table 2.5: DICTIONARY.COLUMNS or SASHELP.VCOLUMN

Column

Type

Length

Label

Libname

char

8

Library Name

Memname

char

32

Member Name

Memtype

char

8

Member Type

Name

char

32

Column Name

Type

char

4

Column Type

Length

num

 

Column Length

Npos

num

 

Column Position

Varnum

num

 

Column Number in Table

Label

char

256

Column Label

Format

char

16

Column Format

Informat

char

16

Column Informat

Idxusage

char

9

Column Index Type

 

Table 2.6: DICTIONARY.EXTFILES or SASHELP.VEXTFL

Column

Type

Length

Label

Fileref

char

8

Fileref

Xpath

char

1024

Path Name

Xengine

char

8

Engine Name

 

Table 2.7: DICTIONARY.INDEXES or SASHELP.VINDEX

Column

Type

Length

Label

Libname

char

8

Library Name

Memname

char

32

Member Name

Memtype

char

8

Member Type

Name

char

32

Column Name

Idxusage

char

9

Column Index Type

Indxname

char

32

Index Name

Indxpos

num

 

Position of Column in Concatenated Key

Nomiss

char

3

Nomiss Option

Unique

char

3

Unique Option

Table 2.8: DICTIONARY.MACROS or SASHELP.VMACRO

Column

Type

Length

Label

Scope

char

9

Macro Scope

Name

char

32

Macro Variable Name

Offset

num

 

Offset into Macro Variable

Value

char

200

Macro Variable Name

 

Table 2.9: DICTIONARY.MEMBERS or SASHELP.VMEMBER

Column

Type

Length

Label

Libname

char

8

Library Name

Memname

char

32

Member Name

Memtype

char

8

Member Type

Engine

char

8

Engine Name

Index

char

32

Indexes

Path

char

1024

Path Name

 

Table 2.10: DICTIONARY.OPTIONS or SASHELP.VOPTION

Column

Type

Length

Label

Optname

char

32

Option Name

Setting

char

1024

Option Setting

Optdesc

char

160

Option Description

Level

char

8

Option Location

 

Table 2.11: DICTIONARY.TABLES or SASHELP.VTABLE

Column

Type

Length

Format

Informat

Label

Libname

char

8

 

 

Library Name

Memname

char

32

 

 

Member Name

Memtype

char

8

 

 

Member Type

Memlabel

char

256

 

 

Dataset Label

Typemem

char

8

 

 

Dataset Type

Crdate

num

 

DATETIME.

DATETIME.

Date Created

Modate

num

 

DATETIME.

DATETIME.

Date Modified

 

Column

Type

Length

Format

Informat

Label

Nobs

num

 

 

 

# of Obs

Obslen

num

 

 

 

Obs Length

Nvar

num

 

 

 

# of Variables

Protect

char

3

 

 

Type of Password Protection

Compress

char

8

 

 

Compression Routine

Encrypt

char

8

 

 

Encryption

Npage

num

 

 

 

# of Pages

Pcompress

num

 

 

 

% Compression

Reuse

char

3

 

 

Reuse Space

Bufsize

num

 

 

 

Bufsize

Delobs

num

 

 

 

# of Deleted Obs

Indxtype

char

9

 

 

Type of Indexes

Datarep

char

32

 

 

Data Representation

Reqvector

char

24

$HEX.

$HEX.

Requirements Vector

 

Table 2.12: DICTIONARY.TITLES or SASHELP.VTITLE

Column

Type

Length

Label

Type

char

1

Title Location

Number

num

 

Title Number

Text

char

256

Title Text

 

Table 2.13: DICTIONARY.VIEWS or SASHELP.VVIEW

Column

Type

Length

Label

Libname

char

8

Library Name

Memname

char

32

Member Name

Memtype

char

8

Member Type

Engine

char

8

Engine Name

Accessing a Dictionary Table’s Contents

The content of a dictionary table is accessed with the SQL procedure’s SELECT statement FROM clause. Results are displayed as rows and columns in a table, and can be used in handling common data processing tasks including obtaining a list of allocated libraries, catalogs and data sets, as well as communicating SAS environment settings to custom software applications. Users should take the time to explore the capabilities of these read-only dictionary tables and become familiar with the type of information they provide.

Dictionary.CATALOGS

Obtaining detailed information about catalogs and their members is quick and easy with the CATALOGS dictionary table. You will be able to capture an ordered list of catalog information by member name including object name and type, description, date created and last modified, and object alias from any SAS library. For example, the following code produces a listing of the catalog objects in the SASUSER library.

Note: Because this dictionary table produces a considerable amount of information, users are advised to specify a WHERE clause when using.

SQL Code

PROC SQL;

  SELECT *

    FROM DICTIONARY.CATALOGS

      WHERE LIBNAME=”SASUSER”;

QUIT;

Results

image shown here

Dictionary.COLUMNS

Retrieving information about the columns in one or more data sets is easy with the COLUMNS dictionary table. Similar to the results of the CONTENTS procedure, you will be able to capture column-level information including column name, type, length, position, label, format, informat, and indexes, as well as produce cross-reference listings containing the location of columns in a SAS library. For example, the following code requests a cross-reference listing of the tables containing the CUSTNUM column in the WORK library.

Note: Care should be used when specifying multiple functions on the WHERE clause because the SQL Optimizer is unable to optimize the query resulting in all allocated SAS session librefs being searched. This can cause the query to run much longer than expected.

SQL Code

PROC SQL;

  SELECT *

    FROM DICTIONARY.COLUMNS

      WHERE UPCASE(LIBNAME)=”WORK” AND

            UPCASE(NAME)=”CUSTNUM”;

QUIT;

Results

image shown here

Dictionary.DICTIONARIES

Users can easily identify all available dictionary tables by accessing the read-only DICTIONARIES dictionary table or VDCTNRY SASHELP view. The contents of the DICTIONARIES dictionary table and VDCTNRY SASHELP view reveals the names of supported tables and views. The following PROC SQL query specifies the UNIQUE keyword to generate a listing of existing dictionary tables.

PROC SQL Code

PROC SQL;

  SELECT UNIQUE MEMNAME

    FROM DICTIONARY.DICTIONARIES;

QUIT;

 

Results

image shown here

Dictionary.EXTFILES

Accessing allocated external files by fileref and corresponding physical path name information is a snap with the EXTFILES dictionary table. The results from this handy table can be used in an application to communicate whether a specific fileref has been allocated with a FILENAME statement. For example, the following code produces a listing of each individual path name by fileref.

SQL Code

PROC SQL;

  SELECT *

    FROM DICTIONARY.EXTFILES;

QUIT;

Results

image shown here

Dictionary.INDEXES

It is sometimes useful to display the names of existing simple and composite indexes, or their SAS tables, that reference a specific column name. The INDEXES dictionary table provides important information to help identify indexes that improve a query’s performance. For example, to display indexes that reference the CUSTNUM column name in any of the example tables, the following code is specified.

Note:  See Chapter 12, “Tuning for Performance and Efficiency,” for performance tuning techniques as they relate to indexes.

SQL Code

PROC SQL;

  SELECT *

    FROM DICTIONARY.INDEXES

      WHERE UPCASE(NAME)=”CUSTNUM”      /* Column Name  */

            AND LIBNAME=”WORK”; /* Library Name */

QUIT;

Dictionary.MACROS

The ability to capture macro variable names and their values is available with the MACROS dictionary table. The MACROS dictionary table provides information for global and automatic macro variables, but not for local macro variables. For example, to obtain columns specific to macros such as global macros SQLOBS, SQLOOPS, SQLXOBS, or SQLRC, the following code is specified.

SQL Code

PROC SQL;

  SELECT *

    FROM DICTIONARY.MACROS

      WHERE UPCASE(SCOPE)=”GLOBAL”;

QUIT;

Results

image shown here

Dictionary.MEMBERS

Accessing a detailed list of data sets, views, and catalogs is the hallmark of the MEMBERS dictionary table. You will be able to access a terrific resource of information by library, member name and type, engine, indexes, and physical path name. For example, to obtain a list of the individual files in the WORK library, the following code is specified.

SQL Code

PROC SQL;

  SELECT *

    FROM DICTIONARY.MEMBERS

      WHERE LIBNAME=”WORK”;

QUIT;

 

Results

image shown here

Dictionary.OPTIONS

The OPTIONS dictionary table provides a list of the current SAS session’s option settings including the option name, its setting, description, and location. Obtaining option settings is as easy as 1-2-3. Simply submit the following SQL query referencing the OPTIONS dictionary table as follows. A partial listing of the results from the OPTIONS dictionary table is displayed below in rich text format.

SQL Code

PROC SQL;

  SELECT *

    FROM DICTIONARY.OPTIONS;

QUIT;

 

Results

image shown here

Dictionary.TABLES

When you need more information about SAS files than what the MEMBERS dictionary table provides, consider using the TABLES dictionary table. The TABLES dictionary table provides such file details as library name, member name and type, date created and last modified, number of observations, observation length, number of variables, password protection, compression, encryption, number of pages, reuse space, buffer size, number of deleted observations, type of indexes, and requirements vector. For example, to obtain a detailed list of files in the WORK library, the following code is specified.

Note: Because the TABLES dictionary table produces a considerable amount of information, users should specify a WHERE clause when using it.

 

SQL Code

PROC SQL;

  SELECT *

    FROM DICTIONARY.TABLES

      WHERE LIBNAME=”WORK”;

QUIT;

Results

image shown here

Dictionary.TITLES

The TITLES dictionary table provides a listing of the currently defined titles and footnotes in a session. The table output distinguishes between titles and footnotes using a “T” or “F” in the TITLE LOCATION column. For example, the following code displays a single title and two footnotes.

SQL Code

PROC SQL;

  SELECT *

    FROM DICTIONARY.TITLES;

QUIT;

 

Results

image shown here

Dictionary.VIEWS

The VIEWS dictionary table provides a listing of views for selected SAS libraries. The result of the VIEWS dictionary table displays the library name, member names and type, and engine used. For example, the following code displays a single view called VIEW_CUSTOMERS from the WORK library.

SQL Code

PROC SQL;

  SELECT *

    FROM DICTIONARY.VIEWS

      WHERE LIBNAME=”WORK”;

QUIT;

Results

image shown here

 

Summary

1.       The available data types in SQL are 1) numeric and 2) character (see the “Data Types Overview” section).

2.       When a table is created with PROC SQL, numeric columns are assigned a default length of 8 bytes (see the “Numeric Data” section).

3.       SAS tables store date and time information in the form of a numeric data type (see the “Date and Time Column Definitions” section).

4.       A CHAR column stores a default of 8 characters (see the “Character Data” section).

5.       Comparison operators are used in the SQL procedure to compare one character or numeric value to another (see the “Comparison Operators” section).

6.       Logical operators are used to connect one or more expressions together in a WHERE clause and consist of AND, OR, and NOT (see the see the “Logical Operators” section).

7.       The arithmetic operators used in the SQL procedure are the same as those used in the DATA step (see the “Arithmetic Operators” section).

8.       Character string operators and functions are typically used with character data (see the “Character String Operators and Functions” section).

9.       Predicates are used in the SQL procedure to perform direct comparisons between two conditions or expressions (see the “Predicates” section).

10.     Missing or unknown information is supported by PROC SQL in a form known as a null value. A null value is not the same as a zero value (see the “Missing Values and Null” section).

11.     When a new column is derived within a query, the CALCULATED keyword is specified before the column name, or alias, in a WHERE clause.

12.     Dictionary tables provide information about the SAS environment (see the “Dictionary Tables” section).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.83.151