Now that you’ve learned the basics of the T-SQL language, you’re ready to move on to more advanced techniques. To really take advantage of what T-SQL has to offer, you must know how to return recordsets that contain data based on multiple tables. In this hour you’ll learn:
• What Join Types Are Available and How to Use Them
• How to Output Data Based on Multiple Tables
• How to Group Your Query Output
• About the Aggregate Functions Available to You
• What the Having Clause Is and When to Use It
• How to Union Data from Multiple Tables
• How to Build Subqueries
The examples that follow are based on a database called Northwind which is included on the sample code CD. Instructions as to how to install the Northwind database are included in Appendix A.
When you build a system based on normalized table structures, you must join the tables back together to see the data in a useable format. For example, if you have separated customers, orders, and order details, you need to join these tables in a query to see the name of the customer who placed an order for a particular item. Several types of joins are available. They include inner joins, outer joins, full joins, and self-joins. The sections that follow cover each of these join types.
An inner join is the most common type of join. When you use an inner join, only rows on the one side of the relationship that have matching rows on the many side of the relationship are included in the output. Here’s an example:
SELECT Customers.CustomerID,
Customers.CompanyName, Orders.OrderID,
Orders.OrderDate
FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
This example includes only those customers who have orders.
The word OUTER
is assumed in the LEFT JOIN
clause used when building a left outer join.
At times you need to join more than two tables in a SQL statement. The most common syntax is
FROM table1 JOIN table2 ON condition1 JOIN table3 ON condition2
The following example joins the Customers, Orders, and OrderDetails tables:
SELECT Customers. CustomerID, Customers.CompanyName,
Orders.OrderID, Orders.OrderDate
FROM (Customers
INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID)
INNER JOIN [Order Details]
ON Orders.OrderID = [Order Details].OrderID
In the example, the order of the joins is unimportant. The exception to this is when you combine inner and outer joins. When you combine inner and outer joins, the SQL Server engine applies two specific rules. First, the nonpreserved table in an outer join cannot participate in an inner join. The nonpreserved table is the one whose rows may not appear. In the case of a left outer join from Customers to Orders, the Orders table is considered the nonpreserved table. Therefore, it cannot participate in an inner join with OrderDetails. The second rule is that the nonpreserved table in an outer join cannot participate with another nonpreserved table in another outer join.
An outer join enables you to include rows from one side of the join in the output, regardless of whether matching rows exist on the other side of the join. Two types of outer joins exist: left outer joins and right outer joins. With a left outer join, SQL Server includes in the output all rows in the first table specified in the SELECT
statement. Here’s an example:
SELECT Customers.CustomerID,
Customers.CompanyName, Orders.OrderID,
Orders.OrderDate
FROM Customers
LEFT OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
In the previous example, customers are included regardless of whether they have orders. With the right outer join shown next, orders are included whether or not they have associated customers. If you have properly enforced referential integrity, this scenario should never exist.
SELECT Customers.CustomerID,
Customers.CompanyName, Orders.OrderID,
Orders.OrderDate
FROM Customers
RIGHT OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
A full join combines the behavior of the left and right outer joins. It looks like this:
SELECT Customers.CustomerID,
Customers.CompanyName, Orders.OrderID,
Orders.OrderDate
FROM Customers
FULL JOIN Orders ON Customers.CustomerID = Orders.CustomerID
In this example, all customers appear in the output regardless of whether they have orders, and all orders appear in the output whether or not they are associated with customers.
A self-join involves joining a table to itself. Although it is not the most common type of join, this join type is very valuable. Imagine the scenario in which an Employee table contains a field called EmployeeID and another field called ReportsTo. The ReportsTo field must contain a valid EmployeeID. It would not make sense to have separate Employee and Supervisor tables because supervisors are employees. This is where the self-join comes in. A self-join looks like this:
SELECT Employees.EmployeeID, Employees.LastName, Employees.FirstName,
Supervisors.EmployeeID as SupervisorID,
Supervisors.LastName as SupervisorLastName,
Supervisors.FirstName as SupervisorFirstName
FROM Employees INNER JOIN Employees as Supervisors
ON Employees.ReportsTo = Supervisors.EmployeeID
In this example, the EmployeeID from the Employees table is joined to an alias of the ReportsTo field of an alias of the Employees table (called Supervisors). The resulting employee and supervisor information is output from the query (see Figure 7.1).
Figure 7.1. A SELECT
statement that shows the result of joining the Employee table to itself.
GROUP BY
ClauseYou can use the GROUP BY
clause to calculate summary statistics. The syntax of the GROUP BY
clause is
GROUP BY group-by-expression1 [,group-by-expression2 [,...]]
You use the GROUP BY
clause to dictate the fields on which SQL Server groups the query result. When you include multiple fields in a GROUP BY
clause, they are grouped from left to right. SQL Server automatically outputs the fields in the order designated in the GROUP BY
clause. In the following example, the SELECT
statement returns the country, city, and total freight for each country/city combination. The results are displayed in order by country and city (see Figure 7.2):
SELECT Customers.Country, Customers.City,
Sum(Orders.Freight) AS SumOfFreight
FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
GROUP BY Customers.Country, Customers.City
Figure 7.2. A SELECT
statement that returns the country, city, and total freight for each country/city combination.
The GROUP BY
clause indicates that SQL Server doesn’t display the detail for the selected records. Instead, it displays the fields indicated in the GROUP BY
uniquely. One of the fields in the SELECT
statement must include an aggregate function. SQL Server displays this result of the aggregate function along with the fields specified in the GROUP BY
clause.
You use aggregate functions to summarize table data. The aggregate functions available include COUNT
, COUNT_BIG
, SUM
, AVG
, MIN
, and MAX
. The following sections discuss each of these aggregate functions. You can find additional aggregate functions in the Books Online.
COUNT
FunctionYou use the COUNT
function to count the number of rows in a table. It looks like this:
SELECT COUNT(*) AS CountOfCustomers FROM Customers
The example counts the number of rows in the Customers table (see Figure 7.3).
Figure 7.3. A SELECT
statement that counts the number of rows in the Customers table.
As an alternative, you can count values in a particular column. The SQL statement looks like this:
SELECT COUNT(Region) AS CountOfCustomers FROM Customers
This example counts the number of regions found in the Customers table (see Figure 7.4).
Figure 7.4. A SELECT
statement that counts the number of regions found in the Customers table.
COUNT_BIG
FunctionThe COUNT_BIG
function is identical to the COUNT
function, except that it returns a bigint
data type. It looks like this:
SELECT COUNT_BIG(Region) AS CountOfCustomers FROM Customers
SUM
FunctionThe SUM
function is available only for numeric columns. It adds the data in the columns. Here’s an example:
SELECT SUM(Freight) FROM Orders
The example totals the Freight column for all rows in the Orders table. When used with the GROUP BY
clause, the SUM
function can easily total values for each grouping.
SELECT ShipVia, SUM(Freight) FROM Orders GROUP BY ShipVia
The example totals the freight for each shipper (see Figure 7.5).
Figure 7.5. A SELECT
statement that totals the freight for each shipper found in the Orders table.
AVG
FunctionJust as you can easily total data, you can average data. The following statement finds the average freight for all orders in the Orders table:
SELECT AVG(Freight) FROM Orders
When used with the GROUP BY
clause, the AVG
function can easily average values in each grouping.
SELECT ShipVia, AVG(Freight) FROM Orders GROUP BY ShipVia
The result provides the average freight for each shipper (see Figure 7.6).
Figure 7.6. A SELECT
statement that provides the average freight for each shipper found in the Orders table.
MIN
FunctionAnother important aggregate function is MIN
. You use the MIN
function to find the minimum value in a column. This statement finds the minimum freight in the Orders table:
SELECT MIN(Freight) FROM Orders
When used with the GROUP BY
clause, the MIN
function can easily find the minimum values in each grouping.
SELECT ShipVia, MIN(Freight) FROM Orders GROUP BY ShipVia
The result provides the minimum freight for each shipper (see Figure 7.7).
Figure 7.7. A SELECT
statement that provides the minimum freight for each shipper found in the Orders table.
MAX
FunctionA related aggregate function is MAX
. You use the MAX
function to find the maximum value in a column. This statement finds the maximum freight in the Orders table (see Figure 7.8):
SELECT MAX(Freight) FROM Orders
Figure 7.8. You use the MAX
function to find the maximum value in a column.
When used with the GROUP BY
clause, the MAX
function can easily find the maximum values in each grouping.
SELECT ShipVia, MAX(Freight) FROM Orders GROUP BY ShipVia
The result provides the maximum freight for each shipper (see Figure 7.9).
Figure 7.9. When used with the GROUP BY
clause, the MAX
function can easily find the maximum values in each grouping.
HAVING
ClauseA HAVING
clause is similar to a WHERE
clause, but it differs in one major respect: SQL Server applies it after it summarizes the data rather than beforehand. In other words, the WHERE
clause is used to determine which rows are grouped. The HAVING
clause determines which groups are included in the output. A HAVING
clause looks like this:
HAVING expression1 [{AND|OR} expression2[...]]
In the following example, SQL Server applies the criteria > 1000
after it applies the aggregate function SUM
to the grouping. Therefore, SQL Server includes only country/city combinations with total freight greater than 1000 in the output (see Figure 7.10).
Figure 7.10. A SELECT
statement that includes country/city combinations with total freight greater than 1000 in the output.
SELECT Customers.Country, Customers.City,
Sum(Orders.Freight) AS SumOfFreight
FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
GROUP BY Customers.Country, Customers.City
HAVING (((Sum(Orders.Freight))>1000))
Union queries enable you to combine rather than join data from more than one table. A typical example of a union query is one that combines data from a Products table and a DiscontinuedProducts table. Another example is a query that combines data from a Customers table and a CustomerArchive table. Here’s an example of a union query:
SELECT ProductID, ProductName, UnitPrice FROM Products
UNION ALL
SELECT ProductID, ProductName, UnitPrice FROM DiscontinuedProducts
This example outputs all rows from the Products table as well as from the DiscontinuedProducts table (see Figure 7.11).
Figure 7.11. A SELECT
statement that outputs all rows from the Products table as well as from the DiscontinuedProducts table.
The Northwind database does not contain a DiscontinuedProducts table. To follow along with this example, you need to create a table with the same structure as the Products table, except the primary key field should not be an IDENTITY
column.
If you want to order the results, you must place the ORDER BY
statement after the second SELECT
statement:
SELECT ProductID, ProductName, UnitPrice FROM Products
UNION ALL
SELECT ProductID, ProductName, UnitPrice FROM DiscontinuedProducts
ORDER BY UnitPrice DESC
In this example, SQL Server combines the results of both SELECT
statements in descending order by UnitPrice.
A subquery is a query that SQL Server evaluates before it evaluates the main query. Here’s an example:
SELECT CustomerID, CompanyName, City, Country FROM Customers WHERE
Customer ID Not In(Select CustomerID FROM Orders)
In this example, SQL Server executes the statement that selects data from the Orders table before it evaluates the statement that selects data from the Customers table (see Figure 7.12).
Figure 7.12. A SELECT
statement that selects data from the Orders table before it evaluates the statement that selects data from the Customers table.
This is not a very efficient method of accomplishing the task of finding all the customers without orders. A better solution would be to use an outer join to solve this problem. You could modify the SQL statement to look like this:
SELECT Customers.CustomerID, CompanyName, City, Country
FROM Customers LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
WHERE Orders.CustomerID Is Null
This example uses a left outer join to select all customers who do not have orders. Because this uses a left outer join, customers are included whether or not they have orders. Because the criteria designate that only rows with a null CustomerID appear in the output, only Customers without orders are included (see Figure 7.13).
Figure 7.13. A SELECT
statement that uses a left outer join to select all customers who do not have orders.
In a world of normalized data, it is important that you understand how to use queries to join your tables back together. This chapter showed you how to join table data. You learned about the various join types available to you. You also learned how to group data, and how to work with aggregate functions. You learned about the HAVING
clause, a clause necessary for working with aggregated data. Finally, you learned about two special types of queries: union queries and subqueries.
Why must you join tables together in a query?
When you build a system based on normalized table structures, you must join the tables back together to see the data in a useable format.
Explain what a full join is.
A full join combines the behavior of a left outer join and a right outer join. An example is where you show all customers whether or not they have orders, and all orders whether or not they are associated with a customer.
Explain the difference between the HAVING
clause and the WHERE
clause.
SQL Server Express applies the HAVING
clause after it summarizes the data, whereas it applies the WHERE
clause before it summarizes the data.
Describe a subquery.
SQL Server Express evaluates a subquery before it evaluates the main query.
2. What is the purpose of aggregate functions?
3. Name four aggregate functions.
4. One of the aggregate functions is COUNT LARGE
(true/false).
Practice joining the Customers and Orders tables in the Northwind sample database with an inner join, a left-outer join, a right-outer join, and a full join. Add a few customers without orders. Note that the number of rows in the output differs for the various join types. Perform a self-join by joining the EmployeeID in the Employees table to the ReportsTo field in the Employees table. Output the LastName and FirstName fields from each table instance used in the query. You should see a list of all employees and their supervisors. Finally, find the sum, average, minimum, and maximum freight amounts in the Orders table.
13.59.177.14