6. Selection Criteria

Keywords Introduced

WHERE • TOP • PERCENT • LIKE • SOUNDEX • DIFFERENCE

Up until this point, the SELECT statements we’ve seen have always brought back every row in the table. This would rarely be the case in real-world situations. One is normally interested only in retrieving data that meets certain criteria. For example, if you’re selecting orders, you probably only want to see orders that meet certain conditions. When looking at products, you ordinarily only want to view certain types of products. Rarely does someone want to simply see everything. Your interest in data is typically directed toward a small subset of that data in order to analyze or view one particular aspect.

Applying Selection Criteria

Selection criteria in SQL begins with the WHERE clause. The WHERE keyword accomplishes the task of selecting a subset of rows. This is the general format of the SELECT statement, including the WHERE clause and other clauses already discussed:

SELECT columnlist
FROM tablelist
WHERE condition
ORDER BY columnlist

As can be seen, the WHERE clause must always come between the FROM and ORDER BY clauses. In fact, if any clause is used, it must appear in the order shown above.

Let’s look at an example, taken from data in this Sales table:

Image

We’ll start with a statement with a simple WHERE clause:

SELECT
FirstName,
LastName,
QuantityPurchased
FROM Sales
WHERE LastName = ‘Baxter’

The output is:

Image

Because the WHERE clause stipulates to select only rows with a LastName equal to ‘Baxter’, only one of the three rows in the table is returned. Notice that the desired value of the LastName column was enclosed in quotes, due to the fact that LastName is a character column. For numeric fields, no quotes are necessary. For example, the following SELECT is equally valid and returns the same data:

SELECT
FirstName,
LastName,
QuantityPurchased
FROM Sales
WHERE QuantityPurchased = 5

WHERE Clause Operators

In the previous statements, an equals sign (=) is used as the operator in the WHERE clause. The equals sign indicates a test for equality. The general format shown above requires that a condition follows the WHERE clause. This condition consists of an operator with expressions on either side.

The following is a list of the basic operators that can be used in the WHERE clause:

Image

More advanced operators will be covered in the next chapter.

The meaning of the equals (=) and does not equal (<>) operators should be obvious. Here’s an example of a WHERE clause with an “is greater than” operator, taken from the same Sales table:

SELECT
FirstName,
LastName,
QuantityPurchased
FROM Sales
WHERE QuantityPurchased > 6

The result is:

Image

In this example, only one row meets the condition that the QuantityPurchased column be greater than 6. Although not as commonly used, it is also possible to use the “is greater than” operator with a text column. This example:

SELECT
FirstName,
LastName
FROM Sales
WHERE LastName > ‘C’

returns:

Image

Because the test is for last names greater than C, it brings back only Li and Dupont, but not Baxter. When applied to text fields, the greater than and less than operators indicate selection by the alphabetic order of the values. In this case, Li and Dupont are returned, since L and D come after C in the alphabet.

Limiting Rows

We may sometimes want to select a small subset of the rows in a table, but don’t care which rows are returned. Let’s say we have a table with 50,000 rows and want to see just a few rows of data to get an idea of what it looks like. It wouldn’t make sense to use the WHERE clause for this purpose, since we don’t care which particular rows are brought back.

In this situation, the solution is to use a special keyword to specify a limit as to how many rows are returned. This is another instance where syntax differs among databases. In Microsoft SQL Server, the keyword that accomplishes this limit is TOP. The general format is:

SELECT
TOP number
columnlist
FROM tablelist

Let’s say that we want to see the first 10 rows from a table. The SELECT to accomplish this looks like:

SELECT
TOP 10 *
FROM table

This statement returns all columns in the first 10 rows from the table. Like any SELECT statement without an ORDER BY clause, there’s no way to predict which 10 rows will be returned. It depends on how the data is physically stored in the table.

Similarly, we can list specific columns to return:

SELECT
TOP 10
column1,
column2
FROM table

Another variant of the TOP is to use a PERCENT keyword in addition to TOP. For example, the general format to return 25% of rows is:

SELECT
TOP 25 PERCENT
column1,
column2
FROM table

In essence, the TOP keyword accomplishes something similar to the WHERE clause, as it returns a small subset of rows in the specified table. However, keep in mind that rows returned using the TOP keyword are not a true random sample, in a statistical sense. They’re only the first rows that qualify, based on how the data is physically stored in the database.

Limiting Rows with a Sort

Another use of the TOP keyword is to use it in combination with the ORDER BY clause to obtain a designated number of rows with the highest values, based on specified criteria. This type of data selection is commonly referred to as a Top N selection. Here’s an example, taken from this Books table:

Image

Let’s say we want to see the three books that sold the most in the current month. The SELECT that accomplishes this is:

SELECT
TOP 3
Title AS ‘Book Title’,
CurrentMonthSales AS ‘Quantity Sold’
FROM Books
ORDER BY CurrentMonthSales DESC

The output is:

Image

Let’s examine this statement in some detail. The TOP 3 in the second line indicates that only three rows of data are to be returned. The main question to ask is how it determines which three rows to display. The answer is found in the ORDER BY clause. If there were no ORDER BY clause, then the SELECT would simply bring back any three rows of data. However, this is not what we want. We’re looking for the three rows with the highest sales. To accomplish this, we need to sort the rows by the CurrentMonthSales column in descending order. Why descending? When data is sorted in descending order, the highest numbers appear first. If we had sorted in an ascending order, we would get the books with the least number of sales, not the most.

Adding one more twist to this scenario, let’s say we only want to see which book by Shakespeare had the greatest sales. In order to accomplish this, we need to add a WHERE clause, as follows:

SELECT
TOP 1
Title AS ‘Book Title’,
CurrentMonthSales AS ‘Quantity Sold’
FROM Books
WHERE Author = ‘Shakespeare’
ORDER BY CurrentMonthSales DESC

This brings back this data:

Image

The WHERE clause adds the qualification to look only at books by Shakespeare. We also revised the TOP keyword to specify TOP 1, indicating that we want to see only one row of data.

Pattern Matching

We now want to turn to a situation in which the data to be retrieved is not precisely defined. We often want to view data based on inexact matches with words or phrases. For example, you might be interested in finding companies whose name contains the word “bank.” The selection of data via inexact matches within phrases is often referred to as pattern matching. In SQL, the LIKE operator is used in the WHERE clause to enable us to find matches against part of a column value. The LIKE operator requires the use of special wildcard characters to specify exactly how the match should work. Let’s start with an example from the following Movies table.

Image

Our first example with a LIKE operator is:

SELECT
MovieTitle AS ‘Movie’
FROM Movies
WHERE MovieTitle LIKE ‘%LOVE%’

In this example, the percent (%) sign is used as a wildcard. The percent (%) is the most commonly used wildcard, and it means any characters. Any characters includes the possibility of there being no characters. The percent (%) before LOVE means that we will accept a phrase with any (or no) characters before LOVE. Similarly, the percent (%) after LOVE means we’ll accept a phrase with any (or no) characters after LOVE. In other words, we’re looking for any movie title that contains the word LOVE. Here is the data returned from the SELECT:

Image

Notice that LOVE appears as the first word, the last word, and sometimes in the middle of the movie title.

Let’s now attempt to find only movies that begin with LOVE. If we issue:

SELECT
MovieTitle AS ‘Movie’
FROM Movies
WHERE MovieTitle LIKE ‘LOVE%’

we will retrieve only this row of data:

Image

Because we’re now specifying the percent (%) wildcard only after the word LOVE, we get back only movies that begin with LOVE. Similarly, if we issue:

SELECT
MovieTitle AS ‘Movie’
FROM Movies
WHERE MovieTitle LIKE ‘%LOVE’

we get only this row:

Image

This is because we have now specified that the phrase must end with the word LOVE.

One might ask how to arrange the wildcards to see only movies that contain the word LOVE in the middle of the title, without seeing movies where LOVE is at the beginning or end. The solution is to specify:

SELECT
MovieTitle AS ‘Movie’
FROM Movies
WHERE MovieTitle LIKE ‘% LOVE %’

Notice that a space has been inserted between the word LOVE and the percent (%) wildcards on either side. This ensures that there is at least one space on both sides of the word. The data brought back from this statement is:

Image

The percent (%) sign is the most common wildcard used with the LIKE operator, but there are a few other possibilities. These include the underscore character (_), a characterlist enclosed in square brackets, and a caret symbol (^) plus a characterlist enclosed in square brackets. The following table lists these wildcards and their meanings:

Image

We’ll use the following Actors table to illustrate statements for these wildcards.

Image

Here’s an example of how the underscore (_) wildcard character can be used:

SELECT
FirstName,
LastName
FROM Actors
WHERE FirstName LIKE ‘_ARY’

The output of this SELECT is:

Image

This statement retrieves these three actors because all have a first name consisting of exactly one character followed by the phrase ARY.

Likewise, if we issue this statement:

SELECT
FirstName,
LastName
FROM Actors
WHERE FirstName LIKE ‘J_N’

it produces:

Image

The actor John Cusack is not selected since John doesn’t fit the J_N pattern. An underscore stands for only one character.

The final wildcards we’ll discuss, [characterlist] and [^characterlist], enable you to specify multiple wildcard values in a single position.

The following illustrates the [characterlist] wildcard:

SELECT
FirstName,
LastName
FROM Actors
WHERE FirstName LIKE ‘[CM]ARY’

This retrieves any rows where FirstName begins with a C or M and ends with ARY. The result is:

Image

The following illustrates the [^characterlist] wildcard:

SELECT
FirstName,
LastName
FROM Actors
WHERE FirstName LIKE ‘[^CM]ARY’

This selects any rows where FirstName does not begin with a C or M and ends with ARY. The result is:

Image

Matching by Sound

Let’s turn from matching letters and characters to matching sounds. SQL provides two functions that give you some interesting ways to compare the sounds of words or phrases in the English language. The two functions are SOUNDEX and DIFFERENCE.

Let’s first look at an example that utilizes the SOUNDEX function:

SELECT
SOUNDEX (‘Smith’) AS ‘Sound of Smith’,
SOUNDEX (‘Smythe’) AS ‘Sound of Smythe’

The result is:

Image

The SOUNDEX function always returns a four-character response, which is a sort of code for the sound of the phrase. The first character is always the first letter of the phrase. In this case, the first character is S because both Smith and Smythe begin with an S.

The remaining three characters are calculated from an analysis of the sound of the rest of the phrase. Internally, the function first removes all vowels and the letter Y. So, the function takes the MITH from SMITH and converts it to MTH. Likewise, it takes the MYTHE from SMYTHE and converts it to MTH. It then assigns a number to represent the sound of the phrase. In this example, that number turns out to be 530.

Since SOUNDEX returns a value of S530 for both Smith and Smythe, you can conclude that they probably have very similar sounds.

Microsoft SQL Server provides one additional function, called DIFFERENCE, which works in conjunction with the SOUNDEX function.

Here’s an example, using the same words:

SELECT
DIFFERENCE (‘Smith’, ‘Smythe’) AS ‘The Difference’

The result is:

Image

The DIFFERENCE function always requires two arguments. Internally, the function first retrieves the SOUNDEX values for each of the arguments and then compares those values. If it returns a value of 4, as in this example, that means that all four characters in the SOUNDEX value are identical. A value of 0 means that none of the characters are identical. Therefore, a DIFFERENCE value of 4 indicates the highest possible match, and a value of 0 is the lowest possible match.

With this in mind, here’s an example of how the DIFFERENCE function can be used to retrieve values that are very similar in sound to a specific phrase. Working from the Actors table, let’s attempt to find rows with a first name that sounds like John. The SELECT statement is:

SELECT
FirstName,
LastName
FROM Actors
WHERE DIFFERENCE (FirstName, ‘John’) = 4

The results are:

Image

The DIFFERENCE function concluded that both John and Jon had a difference value of 4 between the name and the specified value of John.

If you want to analyze exactly why these two rows were selected, you can alter your SELECT to show both the SOUNDEX and DIFFERENCE values for all rows in the table:

SELECT
FirstName,
LastName,
DIFFERENCE (FirstName, ‘John’) AS ‘Difference Value’,
SOUNDEX (FirstName) AS ‘Soundex Value’
FROM Actors

This returns:

Image

Notice that both Jon Voight and John Cusack have a SOUNDEX value of J500 and a DIFFERENCE value of 4 for their first names. This explains why they were initially selected.

Looking Ahead

This chapter introduced the topic of how to apply selection criteria to queries. A number of basic operators, such as equals and greater than, were introduced. The ability to specify these types of basic selection criteria goes a long way toward making the SELECT statement truly useful. We also covered the related topic of limiting the number of rows returned in a query. The ability to limit rows in combination with an ORDER BY clause allows for a useful Top N type of data selection.

We next discussed how to match words or phrases via a specified pattern. Matching by patterns is a significant and widely used function of SQL. Any time you enter a word in a search box and attempt to retrieve all entities containing that word, you are utilizing pattern matching. We concluded the chapter with a look at matching by sound, a practice which is less common that matching by word patterns. The technology exists but there is an inherent difficulty in translating words to sounds, due to the many quirks and exceptions in the English language.

In our next chapter, “Boolean Logic,” we’ll greatly enhance our selection criteria capabilities by introducing a number of new keywords that add sophisticated logic to the WHERE clause. At present, we can do such things as select all customers from the state of New York. In the real world, however, much more is typically required. Boolean logic will allow us to formulate a query that will select customers who are in New York or California but not in Los Angeles or Albuquerque.

Applying Selection Criteria

WHERE Clause Operators

Limiting Rows

Limiting Rows with a Sort

Pattern Matching

Matching by Sound

Looking Ahead

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.157.186