3.2. Formatting Output

PROC SQL consists of a standard set of statements and options to create, retrieve, alter, transform, and transfer data regardless of the operating system or where the data is located. These features provide tremendous power as well as control when integrating information from a variety of sources in a number of ways. Because emphasis is placed on PROC SQL’s data manipulation capabilities and not on its format and output capabilities, many programmers are unfamiliar with the SQL procedure’s output-producing features. Consequently, programmers resort to using report writers or special outputting tools to create the best looking output. To illustrate the virtues of PROC SQL in the SAS System, this chapter presents numerous examples on how output can be formatted and produced.

3.2.1. Writing a Blank Line Between Each Row

Being able to display a blank line between each row of output is available as a procedure option in PROC SQL. As with PROC PRINT, specifying DOUBLE in the SQL procedure inserts a blank line between each row of output (NODOUBLE is the default). Setting this option is especially useful when one or more flowed lines spans or wraps in the output because it provides visual separation between each row of data. This example illustrates using the DOUBLE option to double-space output.

SQL Code

PROC SQL DOUBLE;
  SELECT *
    FROM INVOICE;
QUIT;

Results

                          The SAS System

                                    Invoice
                                   Quantity
  Invoice  Manufacturer  Customer   - Units 
 Invoice Unit  Product
   Number        Number    Number      Sold       
  Price   Number
  ________________________________________________
________________
     1001           500       201         5     $1
,495.00     5001

     1002           600      1301         2     $1
,598.00     6001

     1003           210       101         7      
 $245.00     2101

     1004           111       501         3     $9
,600.00     1110

     1005           500       801         2      
 $798.00     5002

     1006           500       901         4      
 $396.00     6000

     1007           500       401         7    $23
,100.00     1200


To revert back to single-spaced output, you can specify the RESET statement as long as the QUIT statement has not been issued to turn off the SQL procedure. When PROC SQL is active, you can specify the RESET statement with or without options to reestablish each option’s original settings. When the RESET statement is specified with one or more options, only those options are reset. This example illustrates the NODOUBLE option specified to turn off double-spaced output and reset printing back to the default single-spaced output.

SQL Code

PROC SQL;
  RESET NODOUBLE;
QUIT;

3.2.2. Displaying Row Numbers

You can specify an SQL procedure option called NUMBER to display row numbers on output under the column heading Row. The NUMBER option displays row numbers on output. The next example shows the NUMBER option being specified to turn on the display of row numbers.

SQL Code

PROC SQL NUMBER;
  SELECT ITEM,
         UNITS,
         UNITCOST
    FROM PURCHASES;
QUIT;

Results

                   The SAS System

         Row  Item           Units     Unitcost
       _________________________________________

           1  Chair              1      $179.00
           2  Pens              12        $0.89
           3  Paper              4        $6.95
           4  Stapler            1        $8.95
           5  Mouse Pad          1       $11.79
           6  Pens              24        $1.59
           7  Markers            .        $0.99


3.2.3. Concatenating Character Strings

As was presented in Chapter 2, “Working with Data in PROC SQL,” two or more strings can be concatenated together to produce a combined and longer string of characters. The concatenation character string operator, represented by two vertical bars “‖”, “!!”, or “¦¦” (depending on the operating system and keyboard being used), combines two or more strings or columns together forming a new string value. The next example shows the concatenation of the manufacturer city and state columns from the MANUFACTURERS table so that the second column immediately follows the first. Although the two character strings are concatenated, the output illustrates potential problems as a result of using the concatenation operator.

First, column headers have been suppressed for both columns. Without header information, a true understanding of the contents of the output may be in jeopardy. Next, blanks are automatically padded to the entire length of the first concatenated column for each row of data resulting in a “fixed” length for the first column.

SQL Code

PROC SQL;
  SELECT manucity || manustat
    FROM MANUFACTURERS;
QUIT;

Results

               The SAS System

           ______________________
           Houston             TX
           San Diego           CA
           Miami               FL
           San Mateo           CA
           San Diego           CA
           San Diego           CA


To make the preceding output appear a bit more readable and complete, you should consider a few modifications. First, column headers could be assigned as aliases using the AS operator. The maximum size of a user-defined column header is 32 bytes (following valid SAS naming conventions). Finally, the TRIM function (described in Chapter 2, “Working with Data in PROC SQL”) could be used to remove trailing blanks in the city column. This makes the second column act as a floating field.

SQL Code

PROC SQL;
  SELECT TRIM(manucity) || manustat AS Headquarters
    FROM MANUFACTURERS;
QUIT;

Results

                  The SAS System

               Headquarters
               ______________________
               HoustonTX
               San DiegoCA
               MiamiFL
               San MateoCA
               San DiegoCA
               San DiegoCA


Although the preceding output illustrates that some changes were made, it still is difficult to read. A few more cosmetic changes should be made to make it more aesthetically appealing and readable. In the next section, the output will be customized to give the data further separation.

3.2.4. Inserting Text and Constants Between Columns

At times, it is useful to be able to insert text and/or constants in query output. This enables special characters including symbols and comments to be inserted in the output. We can improve the output in the previous example by inserting a comma “,” and a single blank space between the manufacturer city and state information. The final output illustrates an acceptable way to display columnar data using a “free-floating” presentation as opposed to fixed columns.

SQL Code

PROC SQL;
  SELECT trim(manucity) || ', ' || manustat
           As Headquarters
    FROM MANUFACTURERS;
QUIT;

Results

                The SAS System

                Headquarters
                _____________________
                Houston, TX
                San Diego, CA
                Miami, FL
                San Mateo, CA
                San Diego, CA
                San Diego, CA


Another method of automatically concatenating character strings, removing leading and trailing blanks, and inserting text and constants is with the CATX function. The next example shows how the CATX function is specified with a “,” used as a separator between character strings MANUCITY and MANUSTAT.

SQL Code

PROC SQL;
  SELECT CATX(',', manucity, manustat)
           As Headquarters
    FROM MANUFACTURERS;
QUIT;

Results

                   The SAS System

                Headquarters
                _____________________
                Houston, TX
                San Diego, CA
                Miami, FL
                San Mateo, CA
                San Diego, CA
                San Diego, CA


3.2.5. Using Scalar Expressions with Selected Columns

In computing terms, a scalar refers to a quantity represented by a single number or value. The value is not represented as an array or list of values, but as a single value. For example, the value 7 is a scalar value, but (0,0,7) is not. PROC SQL allows the use of scalar expressions and constants with selected columns. Typically, these expressions replace or augment one or more columns in the SELECT statement. To illustrate how a scalar expression is used, assume that a value of 7.5% representing the sales tax percentage is computed for each product in the PRODUCTS table. The results consist of the product name, product cost, and derived computed sales tax column.

Note:Although the computed column contains the results of the sales tax computation for each product, it does not contain a column heading.

SQL Code

PROC SQL;
  SELECT prodname, prodcost,
         .075 * prodcost
    FROM PRODUCTS;
QUIT;

Results

                    The SAS System

                                  Product
     Product Name                    Cost
     ______________________________________________

     Dream Machine              $3,200.00       240
     Business Machine           $3,300.00     247.5
     Travel Laptop              $3,400.00       255
     Analog Cell Phone             $35.00     2.625
     Digital Cell Phone           $175.00    13.125
     Office Phone                 $130.00      9.75
     Spreadsheet Software         $299.00    22.425
     Database Software            $399.00    29.925
     Wordprocessor Software       $299.00    22.425
     Graphics Software            $299.00    22.425


In the next two examples, a column header or alias is assigned to the derived sales tax column computed in the previous example. Two methods exist for achieving this. The first method uses the AS keyword to not only name the derived column, but also to permit referencing the column later in the query. This is useful in situations where a reference to the ordinal position is needed. The next example uses the ordinal position to reference a column in a query with the ORDER BY clause.

SQL Code

PROC SQL;
  SELECT prodname, prodcost,
         .075 * prodcost AS Sales_Tax
    FROM PRODUCTS
      ORDER BY 3;
QUIT;

Results

                      The SAS System

                                    Product
       Product Name                    Cost  Sales_Tax
       _______________________________________________

       Analog Cell Phone             $35.00      2.625
       Office Phone                 $130.00       9.75
       Digital Cell Phone           $175.00     13.125
       Spreadsheet Software         $299.00     22.425
       Graphics Software            $299.00     22.425
       Wordprocessor Software       $299.00     22.425
       Database Software            $399.00     29.925
       Dream Machine              $3,200.00        240
       Business Machine           $3,300.00      247.5
       Travel Laptop              $3,400.00        255


The next example illustrates the second method of assigning a column heading for the computed sales tax column using the LABEL= option. To further enhance the output’s readability, a numeric dollar format is specified.

Note:Because the next example is only a query and the table is not being updated, the assigned attributes are only available for the duration of the step and are not permanently saved in the table’s record descriptor.

SQL Code

PROC SQL;
  SELECT prodname, prodcost,
         .075 * prodcost FORMAT=DOLLAR7.2
						LABEL='Sales Tax'
    FROM PRODUCTS;
QUIT;

Results

                      The SAS System

                                   Product     Sales
      Product Name                    Cost       Tax
      _______________________________________________

      Dream Machine              $3,200.00    $240.00
      Business Machine           $3,300.00    $247.50
      Travel Laptop              $3,400.00    $255.00
      Analog Cell Phone             $35.00      $2.63
      Digital Cell Phone           $175.00     $13.13
      Office Phone                 $130.00      $9.75
      Spreadsheet Software         $299.00     $22.43
      Database Software            $399.00     $29.93
      Wordprocessor Software       $299.00     $22.43
      Graphics Software            $299.00     $22.43


3.2.6. Ordering Output by Columns

By definition, tables are unordered sets of data. The data that comes from a table does not automatically appear in any particular order. To offset this behavior, the SQL procedure provides the ability to impose order in a table by using an ORDER BY clause. When used, this clause orders the query results according to the values in one or more selected columns, it must be specified after the FROM clause.

Rows of data can be ordered in ascending or descending (DESC) order for each column specified (ascending is the default order). To illustrate how selected columns of data can be ordered, let’s first view the PRODUCTS table and all its columns arranged in ascending order by product number (PRODNUM).

SQL Code

PROC SQL;
  SELECT *
    FROM PRODUCTS
      ORDER BY prodnum;
QUIT;

Results

                          The SAS System

                             Manufacturer         
         Product
   Product    Product Name         Number  Product
 Type       Cost
    Number
   _______________________________________________
________________
     1110     Dream Machine           111 
 Workstation   $3,200.00
     1200     Business Machine        120 
 Workstation   $3,300.00
     1700     Travel Laptop           170  Laptop 
       $3,400.00
     2101     Analog Cell Phone       210  Phone  
          $35.00
     2102     Digital Cell Phone      210  Phone  
         $175.00
     2200     Office Phone            220  Phone  
         $130.00
     5001     Spreadsheet Software    500 
 Software        $299.00
     5002     Database Software       500 
 Software        $399.00
     5003     Wordprocessor Software  500 
 Software        $299.00
     5004     Graphics Software       500 
 Software        $299.00


The next example illustrates a query that selects and orders multiple columns of data from the PRODUCTS table. Output is arranged first in ascending order by product type (PRODTYPE) and within product type in descending order by product cost. The code and output is shown.

SQL Code

PROC SQL;
  SELECT prodname, prodtype, prodcost, prodnum
    FROM PRODUCTS
      ORDER BY prodtype, prodcost DESC;
QUIT;

Results

                          The SAS System

                                             
 Product     Product
     Product Name            Product Type       
 Cost      Number
     _____________________________________________
_______________
     Travel Laptop           Laptop         $3,400
.00        1700
     Digital Cell Phone      Phone            $175
.00        2102
     Office Phone            Phone            $130
.00        2200
     Analog Cell Phone       Phone             $35
.00        2101
     Database Software       Software         $399
.00        5002
     Spreadsheet Software    Software         $299
.00        5001
     Graphics Software       Software         $299
.00        5004
     Wordprocessor Software  Software         $299
.00        5003
     Business Machine        Workstation    $3,300
.00        1200
     Dream Machine           Workstation    $3,200
.00        1110


3.2.7. Grouping Data with Summary Functions

Occasionally it may be useful to display data in designated groups. A GROUP BY clause is used in these situations to aggregate and order groups of data using a designated column with the same value. When a GROUP BY clause is used without a summary function, SAS issues a warning on the SAS log with the message, “A GROUP BY clause has been transformed into an ORDER BY clause because neither the SELECT clause nor the optional HAVING clause of the associated table-expression referenced a summary function.” The GROUP BY is transformed into an ORDER BY clause and then processed.

When a GROUP BY clause is used without specifying a summary function in the SELECT statement, the entire table is treated as a single group and ordered in ascending order. The next example illustrates a GROUP BY clause without any summary function specifications. Due to the absence of any summary functions, the GROUP BY clause is automatically transformed into an ORDER BY clause, with the rows being ordered in ascending order by product type (PRODTYPE).

SQL Code

PROC SQL;
  SELECT prodtype,
         prodcost
    FROM PRODUCTS
      GROUP BY prodtype;
QUIT;

Results

                  The SAS System

                                     Product
                  Product Type          Cost
                  __________________________

                  Laptop           $3,400.00
                  Phone              $130.00
                  Phone              $175.00
                  Phone               $35.00
                  Software           $299.00
                  Software           $299.00
                  Software           $299.00
                  Software           $399.00
                  Workstation      $3,200.00
                  Workstation      $3,300.00


When a GROUP BY clause is used with a summary function, the rows are aggregated in a series of groups. This means that an aggregate function is evaluated on a group of rows and not on a single row at a time. Suppose the least expensive product in each product category needs to be identified. A separate query for each product category could be specified using the MIN function to determine the cheapest product. But this would require separate runs to be executed — not a very good approach. A better way to do this would be to specify a GROUP BY clause in a single statement as follows.

SQL Code

PROC SQL;
  SELECT prodtype,
         MIN(prodcost) AS Cheapest
						Format=dollar9.2 Label='Least Expensive'
    FROM PRODUCTS
      GROUP BY prodtype;
QUIT;

Results

                     The SAS System

                                     Least
               Product Type      Expensive
               ___________________________
               Laptop            $3,400.00
               Phone                $35.00
               Software            $299.00
               Workstation       $3,200.00


3.2.8. Grouping Data and Sorting

In the absence of an ORDER BY clause, the SQL procedure automatically sorts the results from a grouped query in the same order as specified in the GROUP BY clause. When both an ORDER BY and GROUP BY clause are specified for the same column(s) or column order, no additional processing occurs to satisfy the request. Because the ORDER BY and GROUP BY clauses are not mutually exclusive, they can be used together. Internally, the GROUP BY clause first sorts the results on the grouping column(s) and then aggregates the rows of the query by the same grouping column.

But what happens when the column(s) specified in the ORDER BY and GROUP BY clauses are not the same? In these situations additional processing requirements are generally needed. The additional processing, in the worst case scenario, may require remerging summary statistics back with the original data. In other cases, additional sorting requirements may be necessary. Suppose information about the least expensive product in each product category is desired. But instead of automatically sorting the results in ascending order by product type, as before, the results will be displayed in ascending order by the least expensive product.

SQL Code

PROC SQL;
  SELECT prodtype,
         MIN(prodcost) AS Cheapest
						Format=dollar9.2 Label='Least Expensive'
    FROM PRODUCTS
      GROUP BY prodtype
						ORDER BY cheapest;
QUIT;

Results

                     The SAS System

                                    Least
             Product Type       Expensive
             ____________________________
             Phone                 $35.00
             Software             $299.00
             Workstation        $3,200.00
             Laptop             $3,400.00


3.2.9. Subsetting Groups with the HAVING Clause

When processing groups of data, it is frequently useful to subset aggregated rows (or groups) of data. This way, aggregated data can be filtered one group at a time in contrast to the WHERE clause where individual rows of data are filtered one row at a time, not aggregated rows. SQL provides a convenient way to subset (or filter) groups of data by using the GROUP BY and HAVING clauses. The HAVING clause is applied after the summary of all observations.

Suppose you want to identify only those product groupings that have an average cost less than $200.00 from the PRODUCTS table. Your first inclination might be to use a summary function in a WHERE clause. But this would not be valid because a WHERE clause is designed specifically to evaluate a single row at a time. This is in direct contrast with the way a summary function processes data because summary functions evaluate groups of rows at a time, not a single row of data at a time as with a WHERE clause. For those already familiar with subqueries as discussed in Chapter 7, “Coding Complex Queries,” you could also approach the problem as a complex query. But there is an easier and more straightforward way of identifying and selecting the desired product groups using the GROUP BY and HAVING clauses, as follows.

SQL Code

PROC SQL;
  SELECT prodtype,
         AVG(prodcost)
            FORMAT=DOLLAR9.2 LABEL='Average Product Cost'
    FROM PRODUCTS
      GROUP BY prodtype
        HAVING AVG(prodcost) <= 200.00;
QUIT;

Results

                  The SAS System

                                 Average
                                 Product
             Product Type           Cost
             ___________________________
             Phone               $113.33


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.204.0