4.3. CASE Expressions

In the SQL procedure, a CASE expression provides a way of determining what the resulting value will be from all rows in a table (or view). Similar to a DATA step SELECT statement (or IF-THEN/ELSE statement), a CASE expression is based on some condition and the condition uses a WHEN-THEN clause to determine what the resulting value will be. An optional ELSE expression can be specified to handle an alternative action if none of the expression(s) identified in the when-condition(s) is satisfied.

A CASE expression must be a valid PROC SQL expression and conform to syntax rules similar to DATA step SELECT-WHEN statements. Even though this topic is best explained by example, let’s take a quick look at the syntax.

CASE <column-name>
    WHEN when-condition THEN result-expression
    <WHEN when-condition THEN result-expression> ...
    <ELSE result-expression>
END

A column name can optionally be specified as part of the CASE expression. If present, it is automatically made available to each WHEN-condition. When it is not specified, the column name must be coded in each WHEN-condition. Let’s examine how a CASE expression works.

If a WHEN-condition is satisfied by a row in a table (or view), then it is considered “true” and the result expression following the THEN expression is processed. The remaining WHEN conditions in the CASE expression are skipped. If a WHEN-condition is “false,” the next WHEN-condition is evaluated. SQL evaluates each WHEN-condition until a “true” condition is found; or in the event all WHEN-conditions are “false,” it then executes the ELSE expression and assigns its value to the CASE expression’s result. A missing value is assigned to a CASE expression when an ELSE expression is not specified and each WHEN-condition is “false.”

In the next example, let’s see how a CASE expression actually works. Suppose a value of “West”, “East”, “Central”, or “Unknown” is desired for each of the manufacturers. Using the manufacturer’s state of residence (MANUSTAT) column, a CASE expression could be constructed to assign the desired value in a unique column for each row of data. A value of “West” is assigned to the manufacturers in California, “East” for manufacturers in Florida, “Central” for manufacturers in Texas, and for all other manufacturers a value of “Unknown” is assigned to represent missing state values. A column heading of Region is assigned to the new derived output column using the AS keyword.

SQL Code

PROC SQL;
  SELECT MANUNAME,
         MANUSTAT,
         CASE
					WHEN MANUSTAT = 'CA' THEN 'West'
					WHEN MANUSTAT = 'FL' THEN 'East'
					WHEN MANUSTAT = 'TX' THEN 'Central'
					ELSE 'Unknown'
					END AS Region
    FROM MANUFACTURERS;
QUIT;

Results

                     The SAS System

                               Manufacturer
    Manufacturer Name          State         Region
    ________________________________________________

    Cupid Computer             TX            Central
    Global Comm Corp           CA            West
    World Internet Corp        FL            East
    Storage Devices Inc        CA            West
    KPL Enterprises            CA            West
    San Diego PC Planet        CA            West


Let’s look at another example. In the PRODUCTS table a column called PRODTYPE contains the type of product (for example, Laptop, Phone, Software, and Workstation) as a text string. Using a CASE expression the assignment of a new data value is derived from the coded values in the PRODTYPE column: “Laptop” = “ Hardware”, “ Phone” = “Hardware”, “Software” = “Software”, and “Workstation” = “Hardware”. A column heading of Product_Classification is assigned to the derived output column with the AS keyword.

SQL Code

PROC SQL;
  SELECT PRODNAME,
         CASE PRODTYPE
					WHEN 'Laptop'      THEN 'Hardware'
					WHEN 'Phone'       THEN 'Hardware'
					WHEN 'Software'    THEN 'Software'
					WHEN 'Workstation' THEN 'Hardware'
					ELSE 'Unknown'
					END AS Product_Classification
    FROM PRODUCTS;
QUIT;

Results

                    The SAS System

                                   Product_
        Product Name               Classification
        _________________________________________

        Dream Machine              Hardware
        Business Machine           Hardware
        Travel Laptop              Hardware
        Analog Cell Phone          Hardware
        Digital Cell Phone         Hardware
        Office Phone               Hardware
        Spreadsheet Software       Software
        Database Software          Software
        Wordprocessor Software     Software
        Graphics Software          Software


4.3.1. Case Logic versus COALESCE Expression

A popular convention among SQL users is to specify a COALESCE function in an expression to perform case logic. As described in Chapter 2, “Working with Data in PROC SQL,” the COALESCE function permits a new value to be substituted for one or more missing column values. When you specify COALESCE in an expression, PROC SQL evaluates each argument from left to right for the occurrence of a nonmissing value.

The first nonmissing value found in the list of arguments is returned; otherwise a missing value, or assigned value, is returned. This approach not only saves programming time; it makes code simpler to maintain.

Expressing logical expressions in one or more WHEN-THEN/ELSE statements is generally easy, and the expressions are easy to code, understand, and maintain. But as the complexities associated with case logic increase, the amount of coding also increases. In the following example a simple CASE expression is presented to illustrate how a value of “Unknown” is assigned and displayed when CUSTCITY is missing.

SQL Case Logic

PROC SQL;
  SELECT CUSTNAME,
         CASE
           WHEN CUSTCITY IS NOT NULL THEN CUSTCITY
           ELSE 'Unknown'
         END AS Customer_City
    FROM CUSTOMERS;
QUIT;

To illustrate the usefulness of the COALESCE function as an alternative to case logic, the same query can be modified to achieve the same results as before. By replacing the case logic with a COALESCE expression as follows, the value of CUSTCITY is automatically displayed unless it is missing. In cases of character data a value of “Unknown” is displayed. This technique makes the COALESCE function a very useful and shorthand approach indeed.

SQL COALESCE Logic

PROC SQL;
  SELECT CUSTNAME,
         COALESCE(CUSTCITY,'Unknown')
						AS Customer_City
    FROM CUSTOMER;
QUIT;

In cases where a COALESCE expression is used with numeric data, the value assigned or displayed must be of the same type as the expression. The next example shows a value of “0” (zero) being assigned and displayed when units (UNITS) from the PURCHASES table are processed.

SQL COALESCE Logic

PROC SQL;
  SELECT ITEM,
         COALESCE(UNITS, 0)
    FROM PURCHASES;
 QUIT;

Results

                      The SAS System

                item               Units
                ________________________

                Chair                  1
                Pens                  12
                Paper                  4
                Stapler                1
                Mouse Pad              1
                Pens                  24
                Markers                0


4.3.2. Assigning Labels and Grouping Data

Assigning data values and group data based on the existence of distinct values for specified table columns is a popular and frequently useful operation. Suppose you want to assign a specific data value and then group the output based on this assigned value. As a savvy SAS user you are probably thinking, “Hey, this is easy — I’ll just create a user-defined format and use it in the PRINT or REPORT procedure.”

In the next example, PROC FORMAT is used to assign temporary formatted values based on a range of values for INVENQTY. The result from executing this simple three-step (non-SQL procedure) program shows that the actual INVENQTY value is temporarily replaced with the “matched” value in the user-defined format. The FORMAT statement performs a look-up process to determine how the data should be displayed. The actual data value being looked up is not changed during the process, but a determination is made as to how its value should be displayed. The BY statement specifies how BY-group processing is to be constructed. The displayed results show the product numbers in relation to their respective inventory quantity status.

Non-SQL Code

PROC FORMAT;
  VALUE INVQTY
       0 –   5 = 'Low on Stock – Reorder'
       6 –  10 = 'Stock Levels OK'
      11 –  99 = 'Plenty of Stock'
     100 – 999 = 'Excessive Quantities';
RUN;

PROC SORT DATA=INVENTORY;
  BY INVENQTY;
RUN;

PROC PRINT DATA=INVENTORY(KEEP=PRODNUM INVENQTY) NOOBS;
  FORMAT INVENQTY INVQTY.;
RUN;

Results

                   The SAS System

          prodnum           invenqty

            5001     Low on Stock - Reorder
            5002     Low on Stock - Reorder
            5001     Low on Stock - Reorder
            1700     Stock Levels OK
            5003     Stock Levels OK
            1110     Plenty of Stock
            5004     Plenty of Stock


The same results can also be derived using a CASE expression in the SQL procedure. In the next example, a CASE expression is constructed using the INVENTORY table to assign values to the user-defined column Inventory_Status. The biggest difference between the FORMAT procedure approach and a CASE expression is that the latter uses one step and does not replace the actual data value with the recoded result. Instead, it creates a new column containing the result of the CASE expression.

SQL Code

PROC SQL;
  SELECT PRODNUM,
         CASE
						WHEN INVENQTY LE 5
						THEN 'Low on Stock - Reorder'
						WHEN 6 LE INVENQTY LE 10
						THEN 'Stock Levels OK'
						WHEN 11 LE INVENQTY LE 99
						THEN 'Plenty of Stock'
						ELSE 'Excessive Quantities'
						END AS Inventory_Status
    FROM INVENTORY
      ORDER BY INVENQTY;
QUIT;

Results

                   The SAS System

         Product
          Number   Inventory_Status
          ________________________________

             5001  Low on Stock - Reorder
             5002  Low on Stock - Reorder
             5001  Low on Stock - Reorder
             1700  Stock Levels OK
             5003  Stock Levels OK
             1110  Plenty of Stock
             5004  Plenty of Stock


4.3.3. Logic and Nulls

The existence of null values frequently introduces complexities for programmers. Instead of coding two-valued logic conditions, such as true and false, logic conditions must be designed to handle three-valued logic: true, false, and unknown. When developing logic conditions, you need to be ready to deal with the possibility of having null values. Program logic should test whether the current value of an expression contains a value or is empty (null).

Let’s examine a CASE expression that is meant to handle the possibility of having missing values in a table. Returning to an example presented earlier in this chapter, suppose we want to assign a value of “South West”, “South East”, “Central”, “Missing”, or “Unknown” to each of the manufacturers based on their state of residence.

SQL Code

PROC SQL;
  SELECT MANUNAME,
         MANUSTAT,
         CASE
           WHEN MANUSTAT = 'CA' THEN 'South West'
           WHEN MANUSTAT = 'FL' THEN 'South East'
           WHEN MANUSTAT = 'TX' THEN 'Central'
           WHEN MANUSTAT = ' '  THEN 'Missing'
           ELSE 'Unknown'
         END AS Region
    FROM MANUFACTURERS;
QUIT;

Results

                   The SAS System

                             Manufacturer
  Manufacturer Name          State         Region
  ___________________________________________________

  Cupid Computer             TX            Central
  Global Comm Corp           CA            South West
  World Internet Corp        FL            South East
  Storage Devices Inc        CA            South West
  KPL Enterprises            CA            South West
  San Diego PC Planet        CA            South West


The results indicate that there were no missing or null values in our database for the column being tested. But, suppose a new row of data were added containing null values in the manufacturer’s city and state of residence columns so our new row looked something like the following:

Manufacturer Number: 800
Manufacturer Name: Spring Valley Products
Manufacturer City: <Missing>
Manufacturer State: <Missing>.

The result would look something like the following if we ran the previous code again.

SQL Code

PROC SQL;
  SELECT MANUNAME,
         MANUSTAT,
         CASE
           WHEN MANUSTAT = 'CA' THEN 'South West'
           WHEN MANUSTAT = 'FL' THEN 'South East'
           WHEN MANUSTAT = 'TX' THEN 'Central'
           WHEN MANUSTAT = ' '  THEN 'Missing'
           ELSE 'Unknown'
         END AS Region
    FROM MANUFACTURERS;
QUIT;

Results

                    The SAS System

                              Manufacturer
   Manufacturer Name          State         Region
   ___________________________________________________

   Cupid Computer             TX            Central
   Global Comm Corp           CA            South West
   World Internet Corp        FL            South East
   Storage Devices Inc        CA            South West
   KPL Enterprises            CA            South West
   San Diego PC Planet        CA            South West
   Spring Valley Products                   Missing


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.17.27