2.3. SQL Operators and Functions

SQL programmers have a number of ways to accomplish their objectives, particularly when the goal is to retrieve and work with data. The SELECT statement is an extremely powerful statement in the SQL language. Its syntax can be somewhat complex because of the number of ways that columns, tables, operators, functions, and predicates can be combined into executable statements.

There are several types of operators and functions in PROC SQL: 1) comparison operators, 2) logical operators, 3) arithmetic operators, 4) character string operators, 5) summary functions, and 6) predicates. Operators and functions provide value-added features for PROC SQL programmers. Each will be presented below.

2.3.1. Comparison Operators

Comparison operators are used in the SQL procedure to compare one character or numeric value to another. As in the DATA step, SQL comparison operators, mnemonics, and their descriptions appear in the following table.

SAS OperatorMnemonic OperatorDescription
=EQEqual to
^= or ¬=NENot equal to
<LTLess than
<=LELess than or equal to
>GTGreater than
>=GEGreater than or equal to

Suppose you want to select only those products from the PRODUCTS table costing more than $300.00. The example below illustrates the use of the greater than sign (>) in a WHERE clause to select products meeting the condition.

SQL Code

PROC SQL;
  SELECT PRODNAME,
         PRODTYPE,
         PRODCOST
    FROM PRODUCTS
      WHERE PRODCOST > 300;
QUIT;

Results

                    The SAS System

                                               
 Product
  Product Name               Product Type         
 Cost
  ________________________________________________
______
  Dream Machine              Workstation      $3
,200.00
  Business Machine           Workstation      $3
,300.00
  Travel Laptop              Laptop           $3
,400.00
  Database Software          Software          
 $399.00


PROC SQL also supports the use of truncated string comparison operators. These operators work by first truncating the longer string to the same length as the shorter string, and then performing the specified comparison. Using any of the comparison operators has no permanent effect on the strings themselves. The list of truncated string comparison operators and their meanings appears below.

Truncated String Comparison OperatorDescription
EQTEqual to
GTTGreater than
LTTLess than
GETGreater than or equal to
LETLess than or equal to
NETNot equal to

2.3.2. Logical Operators

Logical operators are used to connect two or more expressions together in a WHERE or HAVING clause. The available logical operators are AND, OR, and NOT. Suppose you want to select only those software products costing more than $300.00. The example illustrates how the AND operator is used to ensure that both conditions are true.

SQL Code

PROC SQL;
  SELECT PRODNAME,
         PRODTYPE,
         PRODCOST
    FROM PRODUCTS
      WHERE PRODTYPE = 'Software' AND
            PRODCOST > 300;
QUIT;

Results

                     The SAS System

                                               
 Product
  Product Name               Product Type         
 Cost
  ________________________________________________
_____
  Database Software          Software          
 $399.00


The next example illustrates the use of the OR logical operator to select software products or products that cost more than $300.

SQL Code

PROC SQL;
  SELECT PRODNAME,
         PRODTYPE,
         PRODCOST
    FROM PRODUCTS
      WHERE PRODTYPE = 'Software' OR
            PRODCOST > 300;
QUIT;

Results

                      The SAS System

                                               
 Product
  Product Name               Product Type         
 Cost
  ________________________________________________
_____
  Dream Machine              Workstation      $3
,200.00
  Business Machine           Workstation      $3
,300.00
  Travel Laptop              Laptop           $3
,400.00
  Spreadsheet Software       Software          
 $299.00
  Database Software          Software          
 $399.00
  Wordprocessor Software     Software          
 $299.00
  Graphics Software          Software          
 $299.00


The next example illustrates the use of the NOT logical operator to select products that are not software products and do not cost more than $300.

SQL Code

PROC SQL;
  SELECT PRODNAME,
         PRODTYPE,
         PRODCOST
    FROM PRODUCTS
      WHERE NOT PRODTYPE = 'Software' AND
            NOT PRODCOST > 300;
QUIT;

Results

                      The SAS System

                                               
 Product
  Product Name               Product Type         
 Cost
  ________________________________________________
_____
  Analog Cell Phone          Phone              
 $35.00
  Digital Cell Phone         Phone             
 $175.00
  Office Phone               Phone             
 $130.00


2.3.3. Arithmetic Operators

The arithmetic operators used in PROC SQL are the same as those used in the DATA step as well as those found in other languages like C, Pascal, FORTRAN, and COBOL. The arithmetic operators available in the SQL procedure appear below.

OperatorDescription
+Addition
-Subtraction
*Multiplication
/Division
**Exponent (raises to a power)
=Equals

To illustrate how arithmetic operators are used, suppose you want to apply a discount of 20% to the product price (PRODCOST) in the PRODUCTS table and display the results in ascending order by the discounted price. Note that the computed column (PRODCOST * 0.80) does not automatically create a column header on output.

SQL Code

PROC SQL;
  SELECT PRODNAME,
         PRODTYPE,
         PRODCOST * 0.80
    FROM PRODUCTS;
QUIT;

Results

                   The SAS System

   Product Name               Product Type
  ________________________________________________
_____
   Dream Machine              Workstation         
 2560
   Business Machine           Workstation         
 2640
   Travel Laptop              Laptop              
 2720
   Analog Cell Phone          Phone               
   28
   Digital Cell Phone         Phone               
  140
   Office Phone               Phone               
  104
   Spreadsheet Software       Software           
 239.2
   Database Software          Software           
 319.2
   Wordprocessor Software     Software           
 239.2
   Graphics Software          Software           
 239.2


In the next example, suppose you wanted to reference a column that was calculated in the SELECT statement. PROC SQL allows references to a computed column in the same SELECT statement (or a WHERE clause) using the CALCULATED keyword. Note that the computed columns have column aliases created for them using the AS keyword. If the CALCULATED keyword were not specified preceding the calculated column, an error would have been generated.

SQL Code

PROC SQL;
  SELECT PRODNAME,
         PRODTYPE,
         PRODCOST * 0.80 AS DISCOUNT_PRICE
                  FORMAT=DOLLAR9.2,
         PRODCOST – CALCULATED DISCOUNT_PRICE AS LOSS
                  FORMAT=DOLLAR7.2
    FROM PRODUCTS
      ORDER BY 3;
QUIT;

Results

                            The SAS System

                                                 
 DISCOUNT_
   Product Name               Product Type        
    PRICE     LOSS
   _______________________________________________
__________________
   Analog Cell Phone          Phone               
   $28.00    $7.00
   Office Phone               Phone               
  $104.00   $26.00
   Digital Cell Phone         Phone               
  $140.00   $35.00
   Spreadsheet Software       Software            
  $239.20   $59.80
   Graphics Software          Software            
  $239.20   $59.80
   Wordprocessor Software     Software            
  $239.20   $59.80
   Database Software          Software            
  $319.20   $79.80
   Dream Machine              Workstation        
 $2,560.00  $640.00
   Business Machine           Workstation        
 $2,640.00  $660.00
   Travel Laptop              Laptop             
 $2,720.00  $680.00


2.3.4. Character String Operators and Functions

Character string operators and functions are typically used with character data. Numerous operators are presented to make you aware of the power available with the SQL procedure. As you become familiar with each of the operators, you’ll find their real strength as you begin to nest functions within each other.

2.3.4.1. Concatenating Strings Together

The following example illustrates a basic concatenation operator that is used to concatenate two columns and a text string. Note that the created column is not labeled. The concatenation operator will be discussed in greater detail in Chapter 3, “Formatting Output.”

SQL Code

PROC SQL;
  SELECT MANUCITY || "," || MANUSTAT
    FROM MANUFACTURERS;
QUIT;

                    The SAS System

               _______________________
               Houston             ,TX
               San Diego           ,CA
               Miami               ,FL
               San Mateo           ,CA
               San Diego           ,CA
               San Diego           ,CA


2.3.4.2. Finding the Length of a String

The LENGTH function is used to obtain the length of a character string column. LENGTH returns a number equal to the number of characters in the argument. Note that the computed column (LENGTH(PRODNAME)) has a column header created for it called Length by specifying the AS keyword. This example illustrates using the LENGTH function to determine the length of data values.

SQL Code

PROC SQL;
  SELECT PRODNUM,
         PRODNAME,
         LENGTH(PRODNAME) AS Length
    FROM PRODUCTS;
QUIT;

Results

                    The SAS System

      Product
       Number  Product Name                 Length
     _____________________________________________
         1110  Dream Machine                    13
         1200  Business Machine                 16
         1700  Travel Laptop                    13
         2101  Analog Cell Phone                17
         2102  Digital Cell Phone               18
         2200  Office Phone                     12
         5001  Spreadsheet Software             20
         5002  Database Software                17
         5003  Wordprocessor Software           22
         5004  Graphics Software                17


2.3.4.3. Combining Functions and Operators

As in the DATA step, many functions can be used in the SQL procedure. To modify one or more existing rows in a table, the UPDATE statement is used (see Chapter 6, “Modifying and Updating Tables and Indexes,” for more details). The UPDATE statement with SET clause changes the contents of a data value (functioning the same way as a DATA step assignment statement) by assigning a new value to the column identified to the left of the equal sign by a constant or expression referenced to the right of the equal sign.

The UPDATE statement does not automatically produce any output, except for the log messages based on the operation results itself. To illustrate the use of DATA step functions and operators in the SQL procedure, the next example shows a SCAN function that isolates the first piece of information from product name (PRODNAME), a TRIM function to remove trailing blanks from product type (PRODTYPE), and a concatenation operator “||” that concatenates the first character expression with the second expression. Exercise care when using the SCAN function because it returns a 200-byte string.

SQL Code

PROC SQL;
  UPDATE PRODUCTS
    SET PRODNAME = SCAN(PRODNAME,1) || TRIM(PRODTYPE);
QUIT;

SAS Log Results

       PROC SQL;
         UPDATE PRODUCTS
           SET PRODNAME = SCAN(PRODNAME,1) || TRIM
(PRODTYPE);
  NOTE: 10 rows were updated in PRODUCTS.
       QUIT;


An optional WHERE clause can be specified limiting the number of rows that modifications will be applied to. The next example illustrates using a WHERE clause to restrict the number of rows that are updated in the previous example to just “phones”, excluding all the other rows.

SQL Code

PROC SQL;
  UPDATE PRODUCTS
    SET PRODNAME = SCAN(PRODNAME,1) || TRIM(PRODTYPE)
      WHERE PRODTYPE IN ('Phone'),
QUIT;

SAS Log Results

       PROC SQL;
         UPDATE PRODUCTS
           SET PRODNAME = SCAN(PRODNAME,1) || TRIM
(PRODTYPE)
             WHERE PRODTYPE IN ('Phone'),
  NOTE: 3 rows were updated in PRODUCTS.
       QUIT;


2.3.4.4. Aligning Characters

The default alignment for character data is to the left. However, character columns or expressions can also be aligned to the right. Two functions are available for character alignment: LEFT and RIGHT. The next example combines the concatenation operator “||” and the TRIM function with the LEFT function to left align a character expression while inserting a comma “,” and blank between the columns.

SQL Code

PROC SQL;
  SELECT LEFT(TRIM(MANUCITY) || ", " || MANUSTAT)
    FROM MANUFACTURERS;
QUIT;

Results

                    The SAS System

                _______________________
                Houston, TX
                San Diego, CA
                Miami, FL
                San Mateo, CA
                San Diego, CA
                San Diego, CA


The next example illustrates how character data can be right aligned using the RIGHT function.

SQL Code

PROC SQL;
  SELECT RIGHT(MANUCITY)
    FROM MANUFACTURERS;
QUIT;

Results

                   The SAS System

                 ____________________
                              Houston
                            San Diego
                                Miami
                            San Mateo
                            San Diego
                            San Diego


2.3.4.5. Finding the Occurrence of a Pattern with INDEX

To find the occurrence of a pattern, the INDEX function can be used. Frequently, requirements call for a column to be searched using a specific character string. The INDEX function can be used in the SQL procedure to search for patterns in a character string. The character string is searched from left to right for the first occurrence of the specified value. If the desired string is found, the column position of the first character is returned. Otherwise a value of zero (0) is returned. The following arguments are used to search for patterns in a column: the character column or expression and the character string to search for. To find all products with the characters “phone” in the product name (PRODNAME) column, the following code can be specified:

SQL Code

PROC SQL;
  SELECT PRODNUM,
         PRODNAME,
         PRODTYPE
    FROM PRODUCTS
      WHERE INDEX(PRODNAME, 'phone') > 0;
QUIT;

Results

      PROC SQL;
        SELECT PRODNUM,
               PRODNAME,
               PRODTYPE
          FROM PRODUCTS
            WHERE INDEX(PRODNAME, 'phone') > 0;
 NOTE: No rows were selected.
      QUIT;


2.3.4.6. Analysis

As in the DATA step, no rows were selected because the search is case-sensitive and “phone” is specified as all lowercase characters.

2.3.4.7. Changing the Case in a String

The SAS System provides two functions that enable you to change the case of a string’s characters: LOWCASE and UPCASE. The LOWCASE function converts all of the characters in a string or expression to lowercase characters. The UPCASE function converts all of the characters in a string or expression to uppercase characters.

In the previous example, the results of the search were negative even though the character string “phone” appeared multiple times in more than one row. In order to make this search recognize all the possible lower- and uppercase variations of the word “phone”, the search criteria in the WHERE clause could be made “smarter” by combining an UPCASE function with the INDEX function as follows.

SQL Code

PROC SQL;
  SELECT PRODNUM,
         PRODNAME,
         PRODTYPE
    FROM PRODUCTS
      WHERE INDEX(UPCASE(PRODNAME), 'PHONE') > 0;
QUIT;

Results

                     The SAS System

    Product
     Number  Product Name               Product Type
   _______________________________________________
_____
       2101  Analog Cell Phone          Phone
       2102  Digital Cell Phone         Phone
       2200  Office Phone               Phone


In the next example, the LOWCASE function is combined with the INDEX function to produce the identical output from the previous example.

SQL Code

PROC SQL;
  SELECT PRODNUM,
         PRODNAME,
         PRODTYPE
    FROM PRODUCTS
      WHERE INDEX(LOWCASE(PRODNAME), 'phone') > 0;
QUIT;

Results

                     The SAS System
    Product
     Number  Product Name               Product Type
   _______________________________________________
_____
       2101  Analog Cell Phone          Phone
       2102  Digital Cell Phone         Phone
       2200  Office Phone               Phone


2.3.4.8. Extracting Information from a String

Occasionally, processing requirements call for specific pieces of information to be extracted from a column. In these situations the SUBSTR function can be used with a character column by specifying a starting position and the number of characters to extract. The following example illustrates how the SUBSTR function is used to capture the first 4 bytes from the product type (PRODTYPE) column.

SQL Code

PROC SQL;
  SELECT PRODNUM,
         PRODNAME,
         PRODTYPE,
         SUBSTR(PRODTYPE,1,4)
    FROM PRODUCTS
      WHERE PRODCOST > 100.00;
QUIT;

Results

                        The SAS System
   Product
    Number  Product Name               Product Type
   _______________________________________________
_______________
      1110  Dream Machine              Workstation
      Work
      1200  Business Machine           Workstation
      Work
      1700  Travel Laptop              Laptop     
      Lapt
      2102  Digital Cell Phone         Phone      
      Phon
      2200  Office Phone               Phone      
      Phon
      5001  Spreadsheet Software       Software   
      Soft
      5002  Database Software          Software   
      Soft
      5003  Wordprocessor Software     Software   
      Soft
      5004  Graphics Software          Software   
      Soft


2.3.4.9. Phonetic Matching (Sounds-Like Operator =*)

A technique for finding names that sound alike or have spelling variations is available in the SQL procedure. This frequently used technique is referred to as phonetic matching and is performed using the Soundex algorithm. In Joe Celko’s book, SQL for Smarties: Advanced SQL Programming (pages 83-87), he traced the origins of the Soundex algorithm to the developers Margaret O’Dell and Robert C. Russell in 1918. Developed before the first computer, this technique was often used by clerks to manually search for similar sounding names.

Although not technically a function, the sounds-like operator “=*” searches and selects character data based on two expressions: the search value and the matched value. Anyone that has looked for a last name in a local telephone directory is quickly reminded of the possible phonetic variations.

To illustrate how the sounds-like operator works, we will search on last name in a table called CUSTOMERS2. The CUSTOMERS2 table is illustrated below. Although each name has phonetic variations and sounds the same, the results of “Laughler,” “Loffler,” and “Laffler” are spelled differently (illustrated below). The following PROC SQL code uses the sounds-like operator to find all customers that sound like “Lafler”.

CUSTOMERS2 Table
CUSTNUM    CUSTNAME    CUSTCITY
______________________________________
    1      Smith       San Diego
    7      Lafler      Spring Valley
   11      Jones       Carmel
   13      Thompson    Miami
    7      Loffler     Spring Valley
    1      Smithe      San Diego
    7      Laughler    Spring Valley
    7      Laffler     Spring Valley

SQL Code

PROC SQL;
  SELECT CUSTNUM,
         CUSTNAME,
         CUSTCITY
    FROM CUSTOMERS2
      WHERE CUSTNAME =* 'Lafler';
QUIT;

Results

                    The SAS System

       CUSTNUM  CUSTNAME         CUSTCITY
       _________________________________________
       7        Lafler           Spring Valley
       7        Loffler          Spring Valley
       7        Laffler          Spring Valley


Readers familiar with the DATA step SOUNDEX(argument) function to search a string are cautioned that it cannot be used in an SQL WHERE clause. Instead the sounds-like operator “=*” must be specified; otherwise a result of no rows will be selected.

Notice that only three of the four possible phonetic matches were selected in the preceding example (that is, Lafler, Loffler, and Laffler). The fourth possibility “Laughler” was not chosen as a “matched” value in the search by the sounds-like algorithm. In an attempt to overcome the inherent limitation with the sounds-like operator, as described in Celko’s SQL for Smarties (see earlier reference), and to derive a broader list of “matched” values, programmers should make every attempt to develop a comprehensive list of search values to widen the scope of possibilities. We can expand our original search criteria in the previous example to include the missing possibilities using OR logic.

SQL Code

PROC SQL;
  SELECT CUSTNUM,
         CUSTNAME,
         CUSTCITY
    FROM CUSTOMERS2
      WHERE CUSTNAME =* 'Lafler'   OR
            CUSTNAME =* 'Laughler' OR
            CUSTNAME =* 'Lasler';
QUIT;

Results

                    The SAS System

        CUSTNUM  CUSTNAME         CUSTCITY
        _________________________________________
        7        Lafler           Spring Valley
        7        Loffler          Spring Valley
        7        Laughler         Spring Valley
        7        Laffler          Spring Valley


2.3.4.10. Finding the First Nonmissing Value

The first example provides a way to find the first nonmissing value in a column or list. Specified in a SELECT statement, the COALESCE function inspects a column, or, in the case of a list, scans the arguments from left to right, and returns the first nonmissing or non-null value. If all values are missing, the result is missing. To take advantage of the COALESCE function, use arguments all of the same data type. The next example illustrates one approach to computing the total cost for each product purchased based on the number of units and unit costs columns in the PURCHASES table. If either the UNITS or UNITCOST columns contain a missing value, a zero is assigned to prevent the propagation of missing values.

SQL Code

PROC SQL;
  SELECT CUSTNUM,
         ITEM,
         UNITS,
         UNITCOST,
         (COALESCE(UNITS, 0) * COALESCE(UNITCOST, 0))
               AS Totcost FORMAT=DOLLAR6.2
    FROM PURCHASES;
QUIT;

Results

                   The SAS System

  Custnum  Item            Units      Unitcost 
 Totcost
  ________________________________________________
______

        1  Chair               1       $179.00  
 179.00
        1  Pens               12         $0.89  
 $10.68
        1  Paper               4         $6.95  
 $27.80
        1  Stapler             1         $8.95   
 $8.95
        7  Mouse Pad           1        $11.79  
 $11.79
        7  Pens               24         $1.59  
 $38.16
       13  Markers             .         $0.99   
 $0.00


2.3.4.11. Producing a Row Number

A unique undocumented, but unsupported, feature for producing a row (observation) count can be obtained with the MONOTONIC( ) function. Similar to the row numbers produced and displayed on output from the PRINT procedure (without the NOOBS option specified), the MONOTONIC() function displays row numbers too. The MONOTONIC() function automatically creates a column (variable) in the output results or in a new table. Because this is an undocumented feature and not supported in the SQL procedure, you are cautioned that it is possible to obtain duplicates or missing values with the MONOTONIC() function. The next example illustrates the creation of a row number using the MONOTONIC() function in a SELECT statement.

SQL Code

PROC SQL;
  SELECT MONOTONIC() AS Row_Number FORMAT=COMMA6.,
         ITEM,
         UNITS,
         UNITCOST
    FROM PURCHASES;
QUIT;

Results

                  The SAS System

    Row_Number  Item           Units      Unitcost
    ______________________________________________

             1  Chair              1       $179.00
             2  Pens              12         $0.89
             3  Paper              4         $6.95
             4  Stapler            1         $8.95
             5  Mouse Pad          1        $11.79
             6  Pens              24         $1.59
             7  Markers            .         $0.99


A row number can also be produced with the documented and supported SQL procedure option NUMBER. Unlike the MONOTONIC() function, the NUMBER option does not create a new column in a new table. The NUMBER option is illustrated below.

SQL Code

PROC SQL NUMBER;
  SELECT ITEM,
         UNITS,
         UNITCOST
    FROM PURCHASES;
QUIT;

Results

                 The SAS System

         Row  Item           Units      Unitcost
      __________________________________________

           1  Chair              1       $179.00
           2  Pens              12         $0.89
           3  Paper              4         $6.95
           4  Stapler            1         $8.95
           5  Mouse Pad          1        $11.79
           6  Pens              24         $1.59
           7  Markers            .         $0.99


2.3.5. Summarizing Data

The SQL procedure is a wonderful tool for summarizing (or aggregating) data. It provides a number of useful summary (or aggregate) functions to help perform calculations, descriptive statistics, and other aggregating operations in a SELECT statement or HAVING clause. These functions are designed to summarize information and not display detail about data.

Without the availability of summary functions, you would have to construct the necessary logic using somewhat complicated SQL programming constructs. When using a summary function without a GROUP BY clause (see Chapter 3 for details), all the rows in a table are treated as a single group. Consequently, the results are often a single row value.

A number of summary functions are available including facilities to count nonmissing values; determine the minimum and maximum values in specific columns; return the range of values; compute the mean, standard deviation, and variance of specific values; and perform other aggregating functions. The following table is an alphabetical list of the available summary functions. When multiple names for the same function are available, the ANSI-approved name appears first.

Summary Functions
Summary FunctionDescription
AVG, MEANAverage or mean of values
COUNT, FREQ, NAggregate number of nonmissing values
CSSCorrected sum of squares
CVCoefficient of variation
MAXLargest value
MINSmallest value
NMISSNumber of missing values
PRTProbability of a greater absolute value of Student’s t
RANGEDifference between the largest and smallest values
STDStandard deviation
STDERRStandard error of the mean
SUMSum of values
SUMWGTSum of the weight variable values, which is 1
TTesting the hypothesis that the population mean is zero
USSUncorrected sum of squares
VARVariance

The next example uses the COUNT function with the (*) argument to produce a total number of rows, regardless if data is missing. The asterisk (*) is specified as the argument to the COUNT function to count all rows in the PURCHASES table.

SQL Code

PROC SQL;
  SELECT COUNT(*) AS Row_Count
    FROM PURCHASES;
QUIT;

Results

                    The SAS System

                     Row_Count
                     _________

                             7


Unlike the COUNT(*) function syntax that counts all rows, regardless if data is missing or not, the next example uses the COUNT function with the (column-name) argument to produce a total number of nonmissing rows based on the column UNITS.

SQL Code

PROC SQL;
  SELECT COUNT(UNITS) AS Non_Missing_Row_Count
    FROM PURCHASES;
QUIT;

Results

                    The SAS System

                     Non_Missing_
                      Row_Count
                     ____________

                               6


The MIN summary function can be specified to determine what the least expensive product is in the PRODUCTS table.

SQL Code

PROC SQL;
  SELECT MIN(prodcost) AS Cheapest
            Format=dollar9.2 Label='Least Expensive'
    FROM PRODUCTS;
QUIT;

Results

                    The SAS System

                        Least
                    Expensive
                    _________

                       $35.00


In the next example, the SUM function is specified to sum numeric data types for a selected column. Suppose you wanted to determine the total costs of all purchases by customers who bought pens and markers. You could construct the following query to sum all nonmissing values for customers who purchased pens and markers in the PURCHASES table as follows.

SQL Code

PROC SQL;
  SELECT SUM((UNITS) * (UNITCOST))
               AS Total_Purchases FORMAT=DOLLAR6.2
    FROM PURCHASES
      WHERE UPCASE(ITEM)='PENS' OR
            UPCASE(ITEM)='MARKERS';
QUIT;

Results

                        Total_
                     Purchases
                  ____________

                        $48.84


Data can also be summarized down rows (observations) as well as across columns (variables). This flexibility gives SAS users an incredible range of power and the ability to take advantage of several SAS-supplied (or built-in) summary functions. These techniques permit the average of quantities rather than the set of all quantities. Without the ability to summarize data in PROC SQL, users would be forced to write complicated formulas and/or routines or even write and test DATA step programs to summarize data. Two examples illustrate how SQL can be constructed to summarize data: 1) summarizing data down rows and 2) summarizing data across columns.

  1. Summarizing data down rows

The SQL procedure can be used to produce a single aggregate value by summarizing data down rows (or observations). The advantage of using a summary function in PROC SQL is that it will generally compute the aggregate quicker than if a user-defined equation were constructed. It also saves the effort of having to construct and test a program containing the user-defined equation in the first place. Suppose you wanted to know the average product cost for all software in the PRODUCTS table containing a variety of products. The following query computes the average product cost and produces a single aggregate value using the AVG function.

SQL Code

PROC SQL;
 SELECT AVG(PRODCOST) AS
      Average_Product_Cost
  FROM PRODUCTS
   WHERE UPCASE(PRODTYPE) IN
         ('SOFTWARE'),
QUIT;

Results

                        Average_
                   Product_Cost
                             324


  1. Summarizing data across columns

When a computation is needed on two or more columns in a row, the SQL procedure can be used to summarize data across columns. Suppose you wanted to know the average cost of products in inventory. The next example computes the average inventory cost for each product without using a summary function, and once computed displays the value for each row as Average_Price.

SQL Code

PROC SQL;
 SELECT PRODNUM,
        (INVPRICE / INVQTY) AS
          Averge_Price
            FORMAT=DOLLAR8.2
  FROM INVOICE;
QUIT;

Results

                       Product
                 Number  Averge_Price
               ______________________

                   5001       $299.00
                   6001       $799.00
                   2101        $35.00
                   1110      $3200.00
                   5002       $399.00
                   6000        $99.00
                   1200      $3300.00


2.3.6. Predicates

Predicates are used in PROC SQL to perform direct comparisons between two conditions or expressions. Six predicates will be looked at:

  • BETWEEN

  • IN

  • IS NULL, IS MISSING

  • LIKE

  • EXISTS

2.3.6.1. Selecting a Range of Values

The BETWEEN predicate is a way of simplifying a query by selecting column values within a designated range of values. BETWEEN is equivalent to one LE (less than or equal) and one GE (greater than or equal) condition being ANDed together. It is extremely flexible because it works with character, numeric, and date values. Programmers can also combine two or more BETWEEN predicates with AND or OR operators for more complicated conditions. In the next example a range of products costing between $200 and $500 inclusively are selected from the PRODUCTS table.

SQL Code

PROC SQL;
  SELECT PRODNAME,
         PRODTYPE,
         PRODCOST
    FROM PRODUCTS
      WHERE PRODCOST BETWEEN 200 AND 500;
QUIT;

Results

                     The SAS System

                                               
 Product
  Product Name               Product Type         
 Cost
  ________________________________________________
_____
  Spreadsheet Software       Software          
 $299.00
  Database Software          Software          
 $399.00
  Wordprocessor Software     Software          
 $299.00
  Graphics Software          Software          
 $299.00


In the next example, products are selected from the INVENTORY table that were ordered between the years 1999 and 2000. The YEAR function returns the year portion from a SAS date value and is used as the range of values in the WHERE clause.

SQL Code

PROC SQL;
  SELECT PRODNUM,
         INVENQTY,
         ORDDATE
    FROM INVENTORY
      WHERE YEAR(ORDDATE) BETWEEN 1999 AND 2000;
QUIT;

Results

                    The SAS System


                                       Date
                                  Inventory
             Product  Inventory        Last
              Number   Quantity     Ordered
            _______________________________
                1110         20  09/01/2000
                1700         10  08/15/2000
                5001          5  08/15/2000
                5002          3  08/15/2000
                5003         10  08/15/2000
                5004         20  09/01/2000
                5001          2  09/01/2000


The BETWEEN predicate and OR operator are used together in the next example to select products ordered between 1999 and 2000 or where inventory quantities are greater than 15. The YEAR function returns the year portion from a SAS date value and is used as the range of values in the WHERE clause.

SQL Code

PROC SQL;
  SELECT PRODNUM,
         INVENQTY,
         ORDDATE
    FROM INVENTORY
      WHERE (YEAR(ORDDATE) BETWEEN 1999 AND 2000) OR
             INVENQTY > 15;
QUIT;

Results

                    The SAS System


                                       Date
                                  Inventory
             Product  Inventory        Last
              Number   Quantity     Ordered
            _______________________________
                1110         20  09/01/2000
                1700         10  08/15/2000
                5001          5  08/15/2000
                5002          3  08/15/2000
                5003         10  08/15/2000
                5004         20  09/01/2000
                5001          2  09/01/2000


2.3.6.2. Selecting Nonconsecutive Values

The IN predicate selects one or more rows based on the matching of one or more column values in a set of values. The IN predicate creates an OR condition between each value and returns a Boolean value of true if a column value is equal to one or more of the values in the expression list. Although the IN predicate can be specified with single column values, it may be less costly to specify the “=” sign instead. In the next example, the “=” sign is used rather than the IN predicate to select phones from the PRODUCTS table.

SQL Code

PROC SQL;
  SELECT PRODNAME,
         PRODTYPE,
         PRODCOST
    FROM PRODUCTS
      WHERE UPCASE(PRODTYPE) = 'PHONE';
QUIT;

Results

                     The SAS System

                                               
 Product
  Product Name               Product Type         
 Cost
  ________________________________________________
_____
  Analog Cell Phone          Phone              
 $35.00
  Digital Cell Phone         Phone             
 $175.00
  Office Phone               Phone             
 $130.00


In the next example, both phones and software products are selected from the PRODUCTS table. To avoid having to specify two OR conditions, you can specify the IN predicate.

SQL Code

PROC SQL;
  SELECT PRODNAME,
         PRODTYPE,
         PRODCOST
    FROM PRODUCTS
      WHERE UPCASE(PRODTYPE) IN ('PHONE', 'SOFTWARE'),
QUIT;

Results

                     The SAS System

                                               
 Product
  Product Name               Product Type         
 Cost
  ________________________________________________
_____
  Analog Cell Phone          Phone              
 $35.00
  Digital Cell Phone         Phone             
 $175.00
  Office Phone               Phone             
 $130.00
  Spreadsheet Software       Software          
 $299.00
  Database Software          Software          
 $399.00
  Wordprocessor Software     Software          
 $299.00
  Graphics Software          Software          
 $299.00


2.3.6.3. Testing for Null or Missing Values

The IS NULL predicate is the ANSI approach to selecting one or more rows by evaluating whether a column value is missing or null (see earlier section on null values). The next example selects products from the INVENTORY table that are out-of-stock in inventory.

SQL Code

PROC SQL;
  SELECT PRODNUM,
         INVENQTY,
         INVENCST
    FROM INVENTORY
      WHERE INVENQTY IS NULL;
QUIT;

SAS Log Results

      PROC SQL;
        SELECT PRODNUM,
               INVENQTY,
               INVENCST
          FROM INVENTORY
            WHERE INVENQTY IS NULL;
 NOTE: No rows were selected.
      QUIT;
 NOTE: PROCEDURE SQL used:
       real time           0.05 seconds


In the next example products are selected from the INVENTORY table that are currently stocked in inventory. Note that the predicates NOT IS NULL or IS NOT NULL can be specified to produce the same results.

SQL Code

PROC SQL;
  SELECT PRODNUM,
         INVENQTY,
         INVENCST
    FROM INVENTORY
      WHERE INVENQTY IS NOT NULL;
QUIT;

Results

                     The SAS System

             Product  Inventory   Inventory
              Number   Quantity        Cost
             ______________________________
                1110         20  $45,000.00
                1700         10  $28,000.00
                5001          5   $1,000.00
                5002          3     $900.00
                5003         10   $2,000.00
                5004         20   $1,400.00
                5001          2   $1,200.00


The IS MISSING predicate performs identically to the IS NULL predicate by selecting one or more rows containing a missing value (null). The only difference is that specifying IS NULL is the ANSI standard way of expressing the predicate and IS MISSING is commonly used in the SAS System.

The next example uses the IS MISSING predicate with the NOT predicates to select products from the INVENTORY table that are stocked in inventory.

SQL Code

PROC SQL;
  SELECT PRODNUM,
         INVENQTY,
         INVENCST
    FROM INVENTORY
      WHERE INVENQTY IS NOT MISSING;
QUIT;

2.3.6.4. Finding Patterns in a String (Pattern Matching % and _)

Constructing specific search patterns in string expressions is a simple process with the LIKE predicate. The % acts as a wildcard character representing any number of characters, including any combination of upper- or lowercase characters. Combining the LIKE predicate with the % (percent sign) permits case-sensitive searches and is a popular technique used by savvy SQL programmers to find patterns in their data.

Using the LIKE operator with the % (percent sign) provides a wildcard capability enabling the selection of table rows that match a specific pattern. The LIKE predicate is case-sensitive and should be used with care. The wildcard character % preceding and following the search word selects all product types with “Soft” in the name. The following WHERE clause finds patterns in product name (PRODNAME) containing the uppercase character “A” in the first position followed by any number of characters.

SQL Code

PROC SQL;
  SELECT PRODNAME
    FROM PRODUCTS
      WHERE PRODNAME LIKE 'A%';
QUIT;

Results

                     The SAS System

               Product Name
               _________________________
               Analog Cell Phone


The next example selects products whose name contains the word “Soft”. The resulting output contains product types such as “Software” and any other products containing the word “Soft”.

SQL Code

PROC SQL;
  SELECT PRODNAME,
         PRODTYPE,
         PRODCOST
    FROM PRODUCTS
      WHERE PRODTYPE LIKE '%Soft%';
QUIT;

Results

                     The SAS System

                                               
 Product
  Product Name               Product Type         
 Cost
  ________________________________________________
_____
  Spreadsheet Software       Software          
 $299.00
  Database Software          Software          
 $399.00
  Wordprocessor Software     Software          
 $299.00
  Graphics Software          Software          
 $299.00


In the next example, the LIKE predicate is used to check a column for the existence of trailing blanks. The wildcard character % followed by a blank space is specified as the search argument.

SQL Code

PROC SQL;
  SELECT PRODNAME
    FROM PRODUCTS
      WHERE PRODNAME LIKE '% ';
QUIT;

Results

                     The SAS System

                Product Name
                _________________________
                Dream Machine
                Business Machine
                Travel Laptop
                Analog Cell Phone
                Digital Cell Phone
                Office Phone
                Spreadsheet Software
                Database Software
                Wordprocessor Software
                Graphics Software


When a pattern search for a specific number of characters is needed, using the LIKE predicate with the underscore (_) provides a way to pattern match character by character. Thus, a single underscore (_) in a specific position acts as a wildcard placement holder for that position only. Two consecutive underscores (__) act as a wildcard placement holder for those two positions. Three consecutive underscores act as a wildcard placement holder for those three positions. And so forth. In the next example, the first position used to search product type contains the character “P” and the next five positions (represented with five underscores) act as a placeholder for any value.

SQL Code

PROC SQL;
   SELECT PRODNAME,
          PRODTYPE,
          PRODCOST
     FROM PRODUCTS
       WHERE UPCASE(PRODTYPE) LIKE 'P_____';
QUIT;

Results

                     The SAS System

                                               
 Product
  Product Name               Product Type         
 Cost
  ________________________________________________
_____
  Analog Cell Phone          Phone              
 $35.00
  Digital Cell Phone         Phone             
 $175.00
  Office Phone               Phone             
 $130.00


The next example illustrates a pattern search of product name (PRODNAME) where the first three positions are represented as a wildcard; the fourth position contains the lowercase character “a”, followed by any combination of upper- or lowercase characters.

SQL Code

PROC SQL;
  SELECT PRODNAME
    FROM PRODUCTS
      WHERE PRODNAME LIKE '___a%';
QUIT;

Results

                     The SAS System

               Product Name
               _________________________
               Dream Machine
               Database Software


2.3.6.5. Testing for the Existence of a Value

The EXISTS predicate is used to test for the existence of a set of values. In the next example, a subquery is used to check for the existence of customers in the CUSTOMERS table with purchases from the PURCHASES table. More details on subqueries will be presented in Chapter 7, “Coding Complex Queries.”

SQL Code

PROC SQL;
  SELECT CUSTNUM,
         CUSTNAME,
         CUSTCITY
    FROM CUSTOMERS2 C
      WHERE EXISTS
        (SELECT *
          FROM PURCHASES P
            WHERE C.CUSTNUM = P.CUSTNUM);
QUIT;

Results

                        The SAS System

     Customer
       Number  Customer Name             
 Customer's Home City
     _____________________________________________
_____________

            1  Smith                      San Diego
            7  Lafler                     Spring
 Valley
           13  Thompson                   Miami
            7  Loffler                    Spring
 Valley
            1  Smithe                     San Diego
            7  Laughler                   Spring
 Valley
            7  Laffler                    Spring
 Valley


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.135.36