Chapter 7. Adding Dimension with Subqueries and Domain Aggregate Functions

Often, you will carry out your analyses in layers, each layer of analysis using or building on the previous layer. Building layers into analytical processes is actually very common. For instance, when you build a query using another query as the data source, you are layering your analysis. When you build a query based on a temporary table created by a make-table query, you are also layering your analysis.

All these conventional methods of layering analyses have two things in common.

  • They all add a step to your analytical processes. Every query that has to be run in order to feed another query, or every temporary table that has to be created in order to advance your analysis, adds yet another task that must be completed before you get your final results.

  • They all require the creation of temporary tables or transitory queries, inundating your database with table and query objects that lead to a confusing analytical process as well as a database that bloats easily.

This is where subqueries and domain aggregate functions can help. Subqueries and domain aggregate functions allow you to build layers into your analysis within one query, eliminating the need for temporary tables or transitory queries.

Note

The topic of subqueries and domain aggregate functions requires an understanding of SQL (Structured Query Language). Most beginning Access users don't have the foundation in SQL. If you fall into this category, press the pause button here and review Appendix B of this book. There, you'll receive enough of a primer on SQL to continue this chapter.

Enhancing Your Analysis with Subqueries

Subqueries (sometimes referred to as subselect queries) are select queries nested in other queries. The primary purpose of a subquery is to enable you to use the results of one query within the execution of another. With subqueries, you can answer a multiple-part question, specify criteria for further selection, or define new fields for use in your analysis.

The query shown in Figure 7-1 demonstrates how to use a subquery in the design grid. As you look at this, remember that this is just one example of how a subquery can be used. Subqueries are not limited to use as criteria.

To use a subquery in Query Design view, simply enter the SQL statement.

Figure 7.1. To use a subquery in Query Design view, simply enter the SQL statement.

If you were to build the query in Figure 7-1 and switch to SQL view, you would see a SQL statement similar to this one. Can you pick out the subquery? Look for the second SELECT statement.

SELECT CustomerID, Sum(LineTotal) AS SumOfLineTotal
FROM Dim_Transactions
WHERE CustomerID IN
(SELECT [CustomerID] FROM [Dim_Customers] WHERE [State] = "CA")
GROUP BY CustomerID

Note

Subqueries must always be enclosed in parentheses.

The idea behind a subquery is that the subquery is executed first, and the results are used in the outer query (the query in which the subquery is embedded) as a criterion, an expression, a parameter, and so on. In the example shown in Figure 7-1, the subquery first returns a list of CustomerIDs for customers in California. Then the outer query uses that list as criteria to filter out any CustomerIDs not in California.

Why Use Subqueries?

You should know that subqueries often run more slowly than standard queries using joins. This is because subqueries either are executed against an entire dataset or are evaluated multiple times—one time per each row processed by the outer query. This makes them slow to execute, especially if you have a large dataset. So why use them?

Many analyses require multiple step processes that use temporary tables or transitory queries. Although there is nothing inherently wrong with temporary tables and queries, an excess number of them in your analytical processes could lead to a confusing analytical process as well as a database that bloats easily.

Even though using subqueries comes with a performance hit, it may be an acceptable trade for streamlined procedures and optimized analytical processes. You will even find that as you become more comfortable with writing your own SQL statements, you will use subqueries in on-the-fly queries to save time.

Subquery Ground Rules

There are a few rules and restrictions that you must be aware of when using subqueries.

  • Your subquery must have, at a minimum, a SELECT statement and a FROM clause in its SQL string.

  • You must enclose your subquery in parentheses.

  • Theoretically, you can nest up to 31 subqueries within a query. This number, however, is based on your system's available memory and the complexity of your subqueries.

  • You can use a subquery in an expression as long as it returns a single value.

  • You cannot use the DISTINCT keyword in a subquery that includes the GROUP BY clause.

Creating Subqueries Without Typing SQL Statements

You may have a tendency to shy away from subqueries because you may feel uncomfortable with writing your own SQL statements. Indeed, many of the SQL statements necessary to perform the smallest analysis can seem daunting.

Imagine, for example, that you have been asked to provide the number of account managers that have a time in service greater than the average time in service for all account managers. Sounds like a relatively simple analysis, and it is simple when you use a subquery. But where do you start? Well, you could just write an SQL statement into the SQL view of a query and run it. But the truth is that not many Access users create SQL statements from scratch. The smart ones utilize the built-in functionalities of Access to save time and avoid headaches. The trick is to split the analysis into manageable pieces, as shown in the following steps:

  1. Find the average time in service for all account managers. To do this, create the query shown in Figure 7-2.

    Create a query to find the average time in service for all account managers.

    Figure 7.2. Create a query to find the average time in service for all account managers.

  2. Switch to SQL view, shown in Figure 7-3, and copy the SQL statement.

    Switch to SQL view and copy the SQL statement.

    Figure 7.3. Switch to SQL view and copy the SQL statement.

  3. Create a query that will count the number of account managers by time in service. Figure 7-4 does just that.

  4. Right-click in the Criteria row under the TIS_in_Months field and select Zoom. This opens the Zoom dialog shown in Figure 7-5. The Zoom dialog does nothing more than help you more comfortably work with text that is too long to be easily seen at one time in the query grid.

  5. With the Zoom dialog box open, paste the SQL statement you copied previously into to the white input area.

    Note

    Remember that subqueries must be enclosed in parentheses, so you want to enter parentheses around the SQL statement you just pasted. You also need to make sure you delete all carriage returns that were put in automatically by Access.

  6. Finish off the query by entering a greater than (>) sign in from of your subquery and change the aggregate function of the TIS_in_Months row to a WHERE clause. At this point, your query should look like the one shown in Figure 7-6.

    Create a query to count the number of employees by time in service.

    Figure 7.4. Create a query to count the number of employees by time in service.

Paste the first SQL statement you copied into the Criteria row of the TIS_IN_MONTHS field.

Figure 7.5. Paste the first SQL statement you copied into the Criteria row of the TIS_IN_MONTHS field.

Now if you go to the SQL view of the query shown in Figure 7-6, you will see the following SQL statement:

SELECT Count(AccountManagerID) AS MyCount
FROM Dim_AccountManagers
WHERE (((DateDiff("m",[HireDate],Date()))
>(SELECT Avg(DateDiff("m",[HireDate],Date())) AS Avg_TIS_in_Months
FROM Dim_AccountManagers;)));

The beauty is that you do not have to type all this syntax. You simply use your knowledge of Access to piece together the necessary actions that needed to be taken in order to get to the answer. As you become more familiar with SQL, you will find that you can create subqueries manually with no problems.

Running this query tells you there are 12 account managers that have a time in service greater than the company average.

Figure 7.6. Running this query tells you there are 12 account managers that have a time in service greater than the company average.

Using IN and NOT IN with Subqueries

The IN and NOT IN operators enable you to run two queries in one. The idea is that the subquery executes first, and then the outer query uses the resulting dataset to filter the final output.

The example demonstrated in Figure 7-7 first runs a subquery that selects all customers based in CA (California). The outer query then uses the resulting dataset as a criteria to return the sum of LineTotal for only those customers that match the customer numbers returned in the subquery.

This query uses the IN operator with a subquery, allowing you to run two queries in one.

Figure 7.7. This query uses the IN operator with a subquery, allowing you to run two queries in one.

You would use NOT IN to go the opposite way and return the sum of LineTotal for customers that do not match the customer numbers returned in the subquery.

Tip

You can find the query examples in this section in the sample database for this book, located at www.wiley.com.

Using Subqueries with Comparison Operators

As its name implies, a comparison operator (=, <, >, <=, >=, <>, and so on) compares two items and returns True or False. When you use a subquery with a comparison operator, you are asking Access to compare the resulting dataset of your outer query to that of the subquery.

For example, to return all customers who have purchases greater than the average purchase for all customers, you can use the following query (Figure 7-8):

Use comparison operators to compare the resulting dataset of your outer query to the results of the subquery.

Figure 7.8. Use comparison operators to compare the resulting dataset of your outer query to the results of the subquery.

The subquery runs first, giving you the average purchase for all customers. This is a single value that Access then uses to compare the outer query's resulting dataset. In other words, the max purchase for each customer is compared to the company average. If a customer's maximum purchase is greater than the company average, it is included in the final output; otherwise, it is excluded.

Note

A subquery used with a comparison operator must return a single value.

Using Subqueries as Expressions

In every example so far, you have used subqueries in conjunction with the WHERE clause, effectively using the results of a subquery as criteria for your outer query. However, you can also use a subquery as an expression, as long as the subquery returns a single value. The query shown in Figure 7-9 demonstrates how you can use a subquery as an expression in a calculation.

You are using a subquery as an expression in a calculation.

Figure 7.9. You are using a subquery as an expression in a calculation.

This example uses a subquery to get the average units sold by the entire company; that subquery returns a single value. You are then using that value in a calculation to determine the variance between each market's average units sold and the average for the company. The output of this query is shown in Figure 7-10.

Your query result.

Figure 7.10. Your query result.

Using Correlated Subqueries

A correlated query is essentially a subquery that refers back to a column that is in the outer query. What makes correlated subqueries unique is that while standard subqueries are evaluated one time to get a result, a correlated subquery has to be evaluated multiple times: once for each row processed by the outer query. To illustrate this point, consider the following two SQL statements.

Uncorrelated Subqueries

This SQL statement uses an uncorrelated subquery. How can you tell? The subquery does not reference any column in the outer query. This subquery is evaluated one time to give you the average revenue for the entire dataset.

SELECT MainSummary.Branch_Number,
     (SELECT Avg(Revenue)FROM MainSummary)
     FROM MainSummary

Correlated Subqueries

This SQL statement uses a correlated subquery. The subquery reaches back into the outer query and references the Branch_Number column, effectively forcing the subquery to be evaluated for every row that is processed by the outer query. The result of this query is a dataset that shows the average revenue for every branch in the company. Figure 7-11 demonstrates how this SQL statement looks in Design View.

SELECT MainSummary.Branch_Number,
     (SELECT Avg(Revenue)FROM MainSummary AS M2
     WHERE M2.Branch_Number = MainSummary.Branch_Number) AS AvgByBranch
FROM MainSummary
GROUP BY MainSummary.Branch_Number
A correlated subquery

Figure 7.11. A correlated subquery

Using a Correlated Subquery as an Expression

The example shown in Figure 7-9 uses an uncorrelated subquery to determine the variance between each market's average units sold and the average units for the company.

You can apply the same type of technique to correlated subqueries. In the query demonstrated in Figure 7-14, a correlation for each branch number allows you to determine the variance between each employee's annual revenue and the average revenue for that employee's branch.

You can use a correlated subquery as part of an expression.

Figure 7.14. You can use a correlated subquery as part of an expression.

Using Subqueries Within Action Queries

Action queries can be fitted with subqueries just as easily as select queries can. Here are a few examples of how you would use a subquery in an action query.

A Subquery in a Make-Table Query

This example illustrates how to use a subquery within a make-table query.

SELECT E1.Employee_Number, E1.Last_Name, E1.First_Name
INTO OldSchoolEmployees
FROM Employee_Master as E1
WHERE E1.Employee_Number IN
     (SELECT E2.Employee_Number
     FROM Employee_Master AS E2
     WHERE E2.Hire_Date <#1/1/1995#)

A Subquery in an Append Query

This example uses a subquery within an append query.

INSERT INTO CustomerMaster (Customer_Number, Customer_Name, State )
SELECT CompanyNumber,CompanyName,State
FROM LeadList
WHERE CompanyNumber Not In
     (SELECT Customer_Number FROM CustomerMaster)

A Subquery in an Update Query

This example uses a subquery in an update query.

UPDATE PriceMaster SET Price = [Price]*1.1
WHERE Branch_Number In
     (SELECT Branch_Number FROM LocationMaster WHERE Region = "South")

A Subquery in a Delete Query

This example uses a subquery in a delete query.

DELETE CompanyNumber
FROM LeadList
WHERE CompanyNumber In
     (SELECT Customer_Number FROM CustomerMaster)

Domain Aggregate Functions

Domain aggregate functions enable you to extract and aggregate statistical information from an entire dataset (a domain). These functions differ from aggregate queries in that aggregate queries group data before evaluating the values, whereas domain aggregate functions evaluate the values for entire datasets; thus, a domain aggregate function never returns more than one value. To get a clear understanding of the difference between an aggregate query and a domain aggregate function, build the query shown in Figure 7-20.

This query shows you the difference between an aggregate query and a domain aggregate function.

Figure 7.20. This query shows you the difference between an aggregate query and a domain aggregate function.

Run the query to get the results you see in Figure 7-21. You will notice that the Aggregate Sum column contains a different total for each year, whereas the Domain Sum column (the domain aggregate function) contains only one total (for the entire dataset).

You can clearly see the difference between an aggregate query and a domain aggregate function.

Figure 7.21. You can clearly see the difference between an aggregate query and a domain aggregate function.

Note

Although the examples in this chapter show the use of domain aggregate functions in query expressions, keep in mind that you can use these functions in macros, modules, or the calculated controls of forms and reports.

Understanding the Different Domain Aggregate Functions

There are 12 different domain aggregate functions in Access, each one performing a different operation. The Table 7-1 lists each function with its purpose and utility.

Table 7.1. Domain Aggregate Functions

FUNCTION

PURPOSE

DSum

The DSum function returns the sum value of a specified field in the domain. For example, DSum("[LineTotal]", "[Dim_Transactions]") returns the total sum of LineTotal in the Dim_Transactions table.

DAvgf

The DAvg function returns the average value of a specified field in the domain. For example, DAvg("[LineTotal]", "[Dim_Transactions]") returns the average LineTotal in the Dim_Transactions table.

DCount

The DCount function returns the total number of records in the domain. DCount("*", "[Dim_Transactions]"), for example, returns the total number of records in the Dim_ Transactions table.

DLookup

The DLookup function returns the first value of a specified field that matches the criteria you define within the DLookup function. If you don't supply criteria, the DLookup function returns a random value in the domain. For example, DLookUp("[Last_Name]","[Employee_Master]"," [Employee_Number]='42620' ") returns the value in the Last_Name field of the record where the Employee_Number is 42620.

DMin and DMax

The DMin and DMax functions return the minimum and maximum values in the domain, respectively. DMin("[LineTotal]","[Dim_Transactions]") returns the lowest LineTotal in the Dim_Transactions table, whereas DMax("[LineTotal]","[Dim_Transactions]") returns the highest LineTotal.

DFirst and DLast

The DFirst and DLast functions return the first and last values in the domain, respectively. DFirst("[LineTotal]", "[Dim_Transactions]") returns the first LineTotal in the Dim_Transactions table, whereas DLast("[LineTotal]", "[Dim_Transactions]") returns the last.

DStDev, DStDevP, DVar, and DvarP

You can use the DStDev and DStDevP functions to return the standard deviation across a population sample and a population, respectively. Similarly, the DVar and the DVarP functions return the variance across a population sample and a population, respectively. DStDev("[List_Price]", "[Dim_Products]") returns the standard deviation of all prices in the Dim_Products table. DVar ("[List_Price]", "[Dim_Products]") returns the variance of all prices in the Dim_Products table.

Note

DLookup functions are particularly useful when you need to retrieve a value from an outside dataset.

Examining the Syntax of Domain Aggregate Functions

Domain aggregate functions are unique in that the syntax required to make them work varies depending on the scenario. This has produced some very frustrated users who have given up on domain aggregate functions altogether. This section describes some general guidelines that help you in building your domain aggregate functions.

Using No Criteria

In this example, you are summing the values in the LineTotal field from the Dim_Transactions table (domain). Your field names and dataset names must always be wrapped in quotes.

DSum("[LineTotal]","[Dim_Transactions]")

Also, note the use of brackets. Although not always required, it is generally a good practice to use brackets when identifying a field, a table, or a query.

Using Text Criteria

In this example, you are summing the values in the Revenue field from the PvTblFeed table (domain) where the value in the Branch_Number field is 301316. Note that the Branch_Number field is formatted as text. When specifying a criterion that is textual or a string, your criterion must be wrapped in single quotes. In addition, your entire criteria expression must be wrapped in double quotes.

DSum("[Revenue]", "[PvTblFeed]", "[Branch_Number] = '301316' ")

Tip

You can use any valid WHERE clause in the criteria expression of your domain aggregate functions. This adds a level of functionality to domain aggregate functions, as they can support the use of multiple columns and logical operators such as AND, OR, NOT, and so on. An example is:

DSum("[Field1]", "[Table]", "[Field2] = 'A' OR [Field2] = 'B'
AND [Field3] = 2")

If you are referencing a control inside of a form or report, the syntax will change a bit.

DSum("[Revenue]", "[PvTblFeed]", "[Branch_Number] = 
Using Text Criteria
' " & [MyTextControl] & " ' " )

Notice that you are using single quotes to convert the control's value to a string. In other words, if the value of the form control is 301316, then "[Branch_Number] = ' " & [MyTextControl] & " ' " is essentially translated to read "[Branch_Number] = '301316' ".

Using Number Criteria

In this example, you are summing the values in the LineTotal field from the Dim_Transactions table (domain) where the value in the LineTotal field is greater than 500. Notice that you are not using single quotes, as the LineTotal field is an actual number field.

DSum("[LineTotal]", "[Dim_Transactions]", "[LineTotal] > 500 ")

If you are referencing a control inside of a form or report, the syntax changes a bit.

DSum("[LineTotal]", "[Dim_Transactions]", "[LineTotal] >" 
Using Number Criteria
[MyNumericControl])

Using Date Criteria

In this example, you are summing the values in the LineTotal field from the Dim_Transactions table (domain) where the value in the OrderDate field is 07/05/2008.

DSum("[LineTotal]", "[Dim_Transactions]", "[OrderDate] = #07/05/08# ")

If you are referencing a control inside of a form or report, the syntax changes a bit.

DSum("[LineTotal]", "[Dim_Transactions]", "[OrderDate] = 
Using Date Criteria
#" & [MydateControl] & "#")

Notice that you are using number signs to convert the control's value to a date. In other words, if the value of the form control is 07/05/2008, then "[OrderDate] = #" & [MydateControl] & "#" is essentially translated to read "[OrderDate] = #07/05/2008# ".

Using Domain Aggregate Functions

Like subqueries, domain aggregate functions are not very efficient when it comes to performing large-scale analyses and crunching very large datasets. These functions are better suited for use in specialty analyses with smaller subsets of data. Indeed, you most often find domain aggregate functions in environments where the dataset being evaluated is predictable and controlled (form example, functions, forms, and reports). This is not to say, however, that domain aggregate functions don't have their place in your day-to-day data analysis. This section walks through some examples of how you can use domain aggregate functions to accomplish some common tasks.

Calculating the Percent of Total

The query shown in Figure 7-22 returns products by group and the sum of LineTotal for each product category. This is a worthwhile analysis, but you can easily enhance it by adding a column that gives you the percent of total revenue for each product.

You want to add a column that shows the percent of total revenue for each product category.

Figure 7.22. You want to add a column that shows the percent of total revenue for each product category.

To get the percent of the total dollar value that each product makes up, you naturally have to know the total dollar value of the entire dataset. This is where a DSum function can come in handy. The following DSum function returns the total value of the dataset:

DSum("[LineTotal]","[Dim_Transactions]")

Now you can use this function as an expression in the calculation that returns the "percent of total" for each product group. Figure 7-23 demonstrates how.

Use a DSum function as an expression in a calculation to get "percent of total."

Figure 7.23. Use a DSum function as an expression in a calculation to get "percent of total."

The result, shown in Figure 7-24, proves that this is a quick and easy way to get both total by group and percent of total with one query.

You retrieved both total by group and percent of total with one query.

Figure 7.24. You retrieved both total by group and percent of total with one query.

Creating a Running Count

The query in Figure 7-25 uses a DCount function as an expression to return the number of invoices processed on each specific invoice day.

This query returns all invoice dates and the number of invoices processed on each date.

Figure 7.25. This query returns all invoice dates and the number of invoices processed on each date.

Take a moment to analyze what the DCount function is doing.

DCount("[TransactionID]","[Dim_Transactions]","[OrderDate]= 
This query returns all invoice dates and the number of invoices processed on each date.
#" & [OrderDate] & "#")

This DCount function retrieves the count of invoices where the invoice date equals (=) each invoice date returned by the query. In the context of the query shown in Figure 7-25, the resulting dataset shows each invoice date and its own count of invoices.

What would happen if you were to alter the DCount function to tell it to return the count of invoices where the invoice date equals or is earlier than (<=) each invoice date returned by the query, as follows?

DCount("[TransactionID]","[Dim_Transactions]","[OrderDate]<= 
This query returns all invoice dates and the number of invoices processed on each date.
#" & [OrderDate] & "#")

The DCount function would return the count of invoices for each date and the count of invoices for any earlier date, thereby giving you a running count.

To put this into action, simply replace the = operator in the DCount function with the <= operator, as shown in Figure 7-26.

Use the <= operator in your DCount function to return the count of invoice dates that equals or is less than the date returned by the query.

Figure 7.26. Use the <= operator in your DCount function to return the count of invoice dates that equals or is less than the date returned by the query.

Figure 7-27 shows the resulting running count.

You now have a running count in your analysis.

Figure 7.27. You now have a running count in your analysis.

Tip

You can achieve a running sum instead of a running count by using the DSum function.

Using a Value from the Previous Record

The query in Figure 7-28 uses a DLookup function to return the revenue value from the previous record. This value is placed into a new column called "Yesterday."

This method is similar to the one used when creating a running sum in that it revolves around manipulating a comparison operator in order to change the meaning of the domain aggregate function. In this case, DLookup searches for the revenue value where the invoice date is equal to each invoice date returned by the query minus one (-1). If you subtract one from a date, you get yesterday's date!

DLookUp("[Revenue]","[TimeSummary]","[OrderDate] = 
Using a Value from the Previous Record
#" & [OrderDate]-1 & "#")

Tip

If you add 1, you get the next record in the sequence. However, this trick does not work with textual fields. This only works with date and numeric fields. If you are working with a table that does not contain any numeric or date fields, create an autonumber field. This gives you a unique numeric identifier that you can use.

This query uses a DLookup to refer to the previous revenue value.

Figure 7.28. This query uses a DLookup to refer to the previous revenue value.

Running the query in Figure 7-28 yields the results shown in Figure 7-29.

You can take this functionality a step further and perform a calculation on the Yesterday field.

Figure 7.29. You can take this functionality a step further and perform a calculation on the Yesterday field.

You can enhance this analysis by adding a calculated field that gives you the dollar variance between today and yesterday. Create a new column and enter [Revenue]-NZ([Yesterday],0), as shown in Figure 7-30. Note that the Yesterday field is wrapped in an NZ function in order to avoid errors caused by null fields.

Enhance your analysis by adding a variance between today and yesterday.

Figure 7.30. Enhance your analysis by adding a variance between today and yesterday.

Figure 7-31 shows the result.

Another task made possible by domain aggregate functions

Figure 7.31. Another task made possible by domain aggregate functions

Summary

Subqueries and domain aggregate functions allow you to build layers into your analysis within one query, eliminating the need for temporary tables or transitory queries. You can leverage both subqueries and domain aggregate functions to streamline your analytical processes, as well as expand and enhance your analysis.

Subqueries are select queries nested within other queries, allowing you to use the results of one query within the execution of another. The idea behind a subquery is that the subquery is executed first, and the results are used in the outer query (the query in which the subquery is embedded) as a criterion, an expression, a parameter, and so on. Although subqueries often run more slowly than standard queries using joins, there are situations where the performance hit may be an acceptable trade for streamlined procedures and optimized analytical processes.

Domain aggregate functions enable you to extract and aggregate statistical information from an entire dataset (a domain). Unlike aggregate queries where the data is grouped before evaluation, domain aggregate functions evaluate the values for the entire dataset. There are 12 different domain aggregate functions: DSum, DAvg, DCount, DLookup, DMin, DMax, DFirst, DLast DStDev, DStDevP, DVar, and DVarP. Domain aggregate functions are ideal for specialty analyses such as calculating the percent of total, creating a running count, creating a running sum, or using values from previous records.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.93.122