The WHERE Clause

A Brief Overview

The WHERE clause enables you to subset data based on a condition that each row of the table must satisfy. PROC SQL output includes only those rows that satisfy the condition. The WHERE clause is used within the SELECT statement in a PROC SQL step. The expression in the WHERE clause can be any valid SQL expression. In the WHERE clause, you can specify any columns from the underlying tables that are specified in the FROM clause. The columns that are specified in the WHERE clause do not have to be specified in the SELECT clause.

WHERE Clause Syntax

The WHERE clause must come after the SELECT and FROM clauses.
Syntax, WHERE clause:
PROC SQL <options>;
SELECT column-1 <,...column-n>
FROM input-tables
WHERE expression;
QUIT;
expression
can be either character or numeric values. Character values are case sensitive and must be enclosed in single or double quotation marks. Double quotation marks are a SAS enhancement and typically are not allowed in database systems. Numeric values are not enclosed in quotation marks and must be standard numeric values. You cannot include special symbols such as commas or dollar signs when referencing numeric values.
To reference date and time values, use one of the following forms:
  • A SAS date value is a date written in the following form: 'ddmmm<yy>yy'd or "ddmmm<yy>yy"d.
  • A SAS time constant is a time written in the following form:'hh:mm<:ss.s>'t or "hh:mm<:ss.s>"t.
  • A SAS datetime constant is a datetime value written in the following form: 'ddmmm<yy>yy:hh:mm<:ss.s>'dt or "ddmmm<yy>yy:hh:mm<:ss.s>"dt.

Example: Using the WHERE Clause

In the following PROC SQL query, the WHERE clause selects rows in which the value of Salary is less than $32,000.
proc sql;
   select empid, jobcode, salary, salary*0.06 as bonus
      from certadv.payrollmaster
      where salary<32000
      order by jobcode;
quit;
Output 1.5 PROC SQL Query Result: Subset of Payroll with Salaries Less Than $32,000
PROC SQL Query Result: Subset of Payroll with Salaries Less Than $32,000

Subsetting Rows by Using Calculated Values

Understanding How PROC SQL Processes Calculated Columns

An earlier example showed how to define a new column by using the SELECT clause and performing a calculation. The following PROC SQL query creates the new column Total by adding the values of three existing columns: Boarded, Transferred, and Nonrevenue.
proc sql outobs=10;
   select flightnumber, date, destination,
         boarded + transferred + nonrevenue as Total
      from certadv.marchflights;
quit;
You can also use a calculated column in the WHERE clause to subset rows. However, because of the way in which SQL queries are processed, you cannot just specify the column alias in the WHERE clause. To see what happens, take the preceding PROC SQL query and add a WHERE clause in the SELECT statement to reference the calculated column Total.
proc sql outobs=10;
   select flightnumber, date, destination,
          boarded + transferred + nonrevenue as Total
      from certadv.marchflights
      where total < 100;
quit;
When this query is executed, the following error message is displayed in the SAS log.
ERROR: The following columns were not found in the contributing tables: total.
This error message is generated because, in SQL queries, the WHERE clause is processed before the SELECT clause. The SQL processor looks in the table for each column named in the WHERE clause. The table Certadv.Marchflights does not contain a column named Total, so SAS generates an error message.
Note: To avoid the error message, you must use the CALCULATED keyword.

Example: Using Calculated Values in a WHERE Clause

Use the column alias and the CALCULATED keyword in the WHERE clause to refer to a calculated value. The CALCULATED keyword tells PROC SQL that the value is calculated within the query.
proc sql outobs=10;
   select flightnumber, date, destination,
         boarded + transferred + nonrevenue as Total
      from certadv.marchflights
      where calculated total < 100;
quit;
Output 1.6 PROC SQL Query Result: Using the CALCULATED Keyword
PROC SQL Query Result: Using the CALCULATED Keyword
Note: As an alternative to using the keyword CALCULATED, repeat the calculation in the WHERE clause. However, this method is inefficient because PROC SQL must perform the calculation twice. It is the ANSI method that recalculates the WHERE clause. In the preceding query, here is what the alternate WHERE statement would be:
where boarded + transferred + nonrevenue <100;

Example: Using Calculated Values in a SELECT Clause

You can also use the CALCULATED keyword in other parts of a query. In the following example, the SELECT clause calculates the new column Total and then calculates a second new column based on Total. To create the second calculated column, you must specify the keyword CALCULATED in the SELECT clause.
proc sql outobs=10;
   select flightnumber, date, destination,
         boarded + transferred + nonrevenue as Total,
         calculated total/2 as Half
      from certadv.marchflights;
quit;
This query produces the following output.
Output 1.7 PROC SQL Query Result: Using the CALCULATED Keyword
PROC SQL Query Result: Using the CALCULATED Keyword

Subsetting Rows Using Conditional Operators

A Brief Overview

In the WHERE clause, you can specify any valid SAS expression to subset or restrict the data that is displayed in output. The expression might contain any of various types of operators, such as the following.
Type of Operator
Example
comparison
where membertype='GOLD'
logical
where visits<=3 or status='new'
concatenation
where name=trim(last) ||', '||first

Using Operators in PROC SQL

Comparison, logical, and concatenation operators are used in PROC SQL in the same way as they are used in other SAS procedures.
In PROC SQL queries, you can also use the following conditional operators. All of these operators can be used in other SAS procedures.
Conditional Operator
Looks for These Values
Example
BETWEEN-AND
values that occur within an inclusive range
where salary between 70000
             and 80000
CONTAINS or ?
values that contain a specified string
where name contains 'ER' 
where name ? 'ER'
IN
values that match one of a list of values
where code in ('PT' , 'NA', 'FA')
IS MISSING or IS NULL
missing values
where dateofbirth is missing
where dateofbirth is null
LIKE (with  %, )
values that match a specified pattern
where address like '% P%PLACE'
=*
values that sound like a specified value
where lastname=* 'Smith'
Tip
To create a negative condition, you can precede any of these conditional operators with the NOT operator.

Using the BETWEEN-AND Operator to Select within a Range of Values

Use the BETWEEN-AND operator in the WHERE clause to select rows that are based on a range of numeric or character values. The BETWEEN-AND operator is inclusive. The values that you specify as limits for the range of values are included in the query results, in addition to any values that occur between the limits.
Syntax, BETWEEN-AND operator:
BETWEEN value-1 AND value-2
value-1
is the value at one end of the range.
value-2
is the value at the other end of the range.
Note: When specifying the limits for the range of values, it is not necessary to specify the smaller value first.
Here are a few examples of WHERE clauses.
Example
Returns Rows That Contain These Values
where date between '01mar2018'd
        and '07mar2018'd
In this example, the values are specified as date constants.
the value of Date is 01mar2018, 07mar2018, or any date value in between
where salary between 70000
        and 80000
the value of Salary is 70000, 80000, or any numeric value in between
where salary not between 70000
        and 80000
the value of Salary is not between or equal to 70000 and 80000

Using the CONTAINS Operator to Select a String

The CONTAINS or question mark (?) operator is usually used to select rows for which a character column includes a particular string. These operators are interchangeable.
Syntax, CONTAINS operator:
SQL-expression CONTAINS SQL-expression
SQL-expression ? SQL-expression
SQL-expression
is a character column, string (character constant), or expression. A string is a sequence of characters to be matched that must be enclosed in quotation marks.
Note: PROC SQL retrieves a row for output no matter where the string (or second SQL-expression) occurs within the column's (or first SQL-expression's) values. Matching is case sensitive when making comparisons.
Note: The CONTAINS or question mark (?) operator is not part of the ANSI standard; it is a SAS enhancement.
The following PROC SQL query uses CONTAINS to select rows in which the Name column contains the string ER. As the output shows, all rows that contain ER anywhere within the Name column are displayed.
proc sql;
   select name
      from certadv.frequentflyers
      where name contains 'ER';
quit;
Output 1.8 PROC SQL Query Result: Name Containing String ‘ER’ (partial output)
Partial Output: PROC SQL Query Result: Name Containing String ‘ER’

Using the IN Operator to Select Values from a List

Use the IN operator to select only the rows that match one of the values in a list of fixed values, either numeric or character.
Syntax, IN operator:
column IN (constant-1<,...constant-n>)
column
specifies the selected column name.
constant-1 and constant-n
represent a list that contains one or more specific values. The list of values must be enclosed in parentheses and separated by either commas or spaces. Values can be either numeric or character. Character values must be enclosed in quotation marks.
Here are examples of WHERE clauses that contain the IN operator.
Example
Returns Rows That Contain These Values
where jobcategory in ('PT','NA','FA')
the value of JobCategory is PT, NA, or FA.
where dayofweek in (2,4,6)
the value of DayOfWeek is 2, 4, or 6.
where chesspiece not in
('pawn','king','queen')
the value of Chesspiece is anything but pawn, king, or queen.

Using the IS MISSING or IS NULL Operator to Select Missing Values

Use the IS MISSING or IS NULL operator to select rows that contain missing values, both character and numeric. These operators are interchangeable.
Syntax, IS MISSING or IS NULL operator:
column IS MISSING
column IS NULL
column
specifies the selected column name.
Note: The IS MISSING operator is not part of the ANSI standard for SQL. It is a SAS enhancement.
Suppose you want to find out whether the table Certadv.Marchflights has any missing values in the column Boarded. You can use the following PROC SQL query to retrieve rows from the table that have missing values:
proc sql;
   select boarded, transferred, nonrevenue, deplaned
      from certadv.marchflights
      where boarded is missing;
quit;
The following output displays two rows in the table that have missing values for Boarded.
Output 1.9 PROC SQL Query Result: IS MISSING
PROC SQL Query Result: Is Missing
Tip
Alternatively, you can specify missing values without using the IS MISSING or IS NULL operator, as shown in the following examples:
where boarded = .
where flight = ' '
However, the advantage of using the IS MISSING or IS NULL operator is that you do not have to specify the data type (character or numeric) of the column.

Using the LIKE Operator to Select a Pattern

To select rows that have values that match a specific pattern of characters rather than a fixed character string, use the LIKE operator. For example, using the LIKE operator, you can select all rows in which the LastName value starts with H. (If you wanted to select all rows in which the last name contains the string HAR, you would use the CONTAINS operator.)
Syntax, LIKE operator:
column LIKE 'pattern'
column
specifies the column name.
pattern
specifies the pattern to be matched and contains one or both of the special characters underscore ( _ ) and percent sign (%). The entire pattern must be enclosed in quotation marks and matching is case sensitive.
When you use the LIKE operator in a query, PROC SQL uses pattern matching to compare each value in the specified column with the pattern that you specify using the LIKE operator. The query output displays all rows in which there is a match.
You specify a pattern using one or both of the special characters shown below.
Special Character
Represents This Pattern
underscore ( _ )
any single character
percent sign (%)
any sequence of zero or more characters
Note: The underscore (_) and percent sign (%) are sometimes referred to as wildcard characters.
To specify a pattern, combine one or both of the special characters with any other characters that you want to match. The special characters can appear before, after, or on both sides of other characters.
The following PROC SQL query uses the LIKE operator to find all frequent-flyer club members whose street name begins with P and ends with the word PLACE. The following PROC SQL step performs this query:
proc sql;
   select ffid, name, address
      from certadv.frequentflyers
      where address like '%P%PLACE';
quit;
The pattern '%P%PLACE' specifies the following sequence:
  • any number of characters (%)
  • a space
  • the letter P
  • any number of characters (%)
  • the word PLACE
Output 1.10 PROC SQL Query Result: PLACE
PROC SQL Query Result: PLACE

Using the Sounds-Like (=*) Operator to Select a Spelling Variation

Use the sounds-like (=*) operator in the WHERE clause to select rows that contain a value that sounds like another value that you specify.
Syntax, sounds-like (=*) operator:
SQL-expression =* SQL-expression
SQL-expression
is a character column, string (character constant), or expression. A string is a sequence of characters to be matched that must be enclosed in quotation marks.
The sounds-like (=*) operator uses the SOUNDEX algorithm to compare each value of a column (or other SQL-expression) with the word or words (or other SQL-expression) that you specify. Any rows that contain a spelling variation of the value that you specified are selected for output.
For example, here is a WHERE clause that contains the sounds-like operator:
where lastname =* 'Smith';
The sounds-like operator does not always select all possible values. For example, suppose you use the preceding WHERE clause to select rows from the following list of names that sound like Smith:
  • Schmitt
  • Smith
  • Smithson
  • Smitt
  • Smythe
Two of the names in this list will not be selected: Schmitt and Smithson.
Note: The SOUNDEX algorithm is English-biased and is less useful for languages other than English.
Last updated: October 16, 2019
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.110.97