Chapter 8
In This Chapter
Reviewing function basics
Using the DAVERAGE function
Using the DCOUNT and DCOUNTA functions
Using the DGET function
Using the DMAX and DMIN functions
Using the DPRODUCT function
Using the DSTDEV and DSTDEVP functions
Using the DSUM function
Using the DVAR and DVARP functions
Excel provides a special set of functions, called database functions, especially for simple statistical analysis of information that you store in Excel tables. In this chapter, I describe and illustrate these functions.
Note: Excel also provides a rich set of statistical functions, which are also wonderful tools for analyzing information in an Excel table. Skip to Chapter 9 for details on these statistical functions.
The Excel database functions work like other Excel functions. In a nutshell, when you want to use a function, you create a formula that includes the function. Because I don’t discuss functions in detail anywhere else in this book — and because you need to be relatively proficient with the basics of using functions in order to employ them in any data analysis — I review some basics here, including function syntax and entering functions.
Most functions need arguments, or inputs. In particular, all database functions need arguments. You include these arguments inside parentheses. If a function needs more than one argument, you can separate arguments by using commas.
For illustration purposes, here are a couple of example formulas that use simple functions. These aren’t database functions, by the way. I get to those in later sections of this chapter. Read through these examples to become proficient with the everyday functions. (Or just breeze through these as a refresher.)
You use the SUM function to sum, or add up, the values that you include as the function arguments. In the following example, these arguments are 2, 2, the value in cell A1, and the values stored in the worksheet range B3:G5.
=SUM(2,2,A1,B3:G5)
Here’s another example. The following AVERAGE function calculates the average, or arithmetic mean, of the values stored in the worksheet range B2:B100.
=AVERAGE(B2:B100)
Simply, that’s what functions do. They take your inputs and perform some calculation, such as a simple sum or a slightly more complicated average.
How you enter a function-based formula into a cell depends on whether you’re familiar with how the function works — at least roughly.
If you’re familiar with how a function works — or at the very least, you know its name — you can simply type an equals sign followed by the function name into the cell. SUM and AVERAGE are good examples of easy-to-remember function names. When you type that first parenthesis [(] after entering the full function name, Excel displays a pop-up ScreenTip that names the function arguments and shows their correct order. (Refer to the previous section, “Understanding function syntax rules,” if you need to brush up on some mechanics.) In Figure 8-1, for example, you can see how this looks in the case of the loan payment function, which is named PMT.
If you’re not familiar with how a function works — maybe you’re not even sure what function that you want to use — you need to use the Formulas tab’s Insert Function command to find the function and then correctly identify the arguments.
To use the Function Wizard command in this manner, follow these steps:
You do this in the usual way. For example, you can click the cell. Or you can use the navigation keys, such as the arrow keys, to move the cell selector to the cell.
Excel displays the Insert Function dialog box, as shown in Figure 8-2.
For example, if you want to calculate a standard deviation for a sample, type something like standard deviation.
In the Select a Function list box, Excel displays a list of the functions that might just work for you, as shown in Figure 8-3.
Note: The STDEVPA function in Figure 8-3 isn’t a database function, so I don’t describe it in this chapter. Read through Chapter 9 for more on this function.
To find the right function for your purposes, first select a function in the Select a Function list. Then read the full description of the function, which appears beneath the function list. If the function you select isn’t the one you want, select another function and read its description. Repeat this process until you find the right function.
If you get to the end of the list of functions and still haven’t found what you want, consider repeating Step 3, but this time use a different (and hopefully better) description of the calculation you want to make.
Excel displays the Function Arguments dialog box, as shown in Figure 8-4.
To supply the arguments that a function needs, click an argument text box (Value1 and Value2 in Figure 8-4). Next, read the argument description, which appears at the bottom of the dialog box. Then supply the argument by entering a value, formula, or cell or range reference into the argument text box.
If a function needs more than one argument, repeat this step for each argument.
Excel calculates the function result based on the arguments that you enter and displays this value at the bottom of the dialog box next to Formula Result =, as shown in Figure 8-5.
If you need help using some function, your first resource — yes, even before you check this chapter — should be to click the Help on This Function hyperlink, which appears in the bottom-left corner of the Function Arguments dialog box. In Figure 8-6, you can see the help information that Excel displays for the STDEVPA function.
And now it’s party time. In the next section, I describe each of the database statistical functions that Excel provides.
The DAVERAGE function calculates an average for values in an Excel list. The unique and truly useful feature of DAVERAGE is that you can specify that you want only list records that meet specified criteria included in your average.
The DAVERAGE function uses the following syntax:
=DAVERAGE(database,field,criteria)
where database is a range reference to the Excel table that holds the value you want to average, field tells Excel which column in the database to average, and criteria is a range reference that identifies the fields and values used to define your selection. The field argument can be a cell reference holding the field name, the field name enclosed in quotation marks, or a number that identifies the column (1 for the first column, 2 for the second column, and so on).
As an example of how the DAVERAGE function works, suppose that you’ve constructed the worksheet shown in Figure 8-7. Notice that the worksheet range holds a small table. Row 1 predictably stores field names: Name, State, and Donation. Rows 2–11 store individual records.
Rows 14 and 15 store the criteria range. The criteria range typically duplicates the row of field names. The criteria range also includes at least one other row of labels or values or Boolean logic expressions that the DAVERAGE function uses to select records from the list. In Figure 8-7, for example, note the Boolean expression in cell C15, <500, which tells the function to include only records where the Donation field shows a value less than 500.
The DAVERAGE function, which appears in cell F3, is
=DAVERAGE(A1:C11,"Donation",A14:C15)
and it returns the average donation amount shown in the database list, excluding the donation from Jeannie in California because that amount isn’t less than 500. The actual function result is 63.88889.
Although I mention this in a couple of other places in this book, I want to repeat something important: Each row in your criteria range is used to select records for the function. For example, if you use the criteria range shown in Figure 8-8, you select records using two criteria. The criterion in row 15 tells the DAVERAGE function to select records where the donation is less than 500. The criterion in row 16 tells the DAVERAGE function to select records where the state is California. The DAVERAGE function, then, uses every record in the list because every record meets at least one of the criteria. The records in the list don't have to meet both criteria; just one of them.
To combine criteria — suppose that you want to calculate the DAVERAGE for donations from California that are less than 500 — you put both the criteria into the same row, as shown in row 15 of Figure 8-9.
The DCOUNT and DCOUNTA functions count records in a database table that match criteria that you specify. Both functions use the same syntax, as shown here:
=DCOUNT(database,field,criteria)
=DCOUNTA(database,field,criteria)
where database is a range reference to the Excel table that holds the value that you want to count, field tells Excel which column in the database to count, and criteria is a range reference that identifies the fields and values used to define your selection criteria. The field argument can be a cell reference holding the field name, the field name enclosed in quotation marks, or a number that identifies the column (1 for the first column, 2 for the second column, and so on).
The functions differ subtly, however. DCOUNT counts fields with values; DCOUNTA counts fields that aren’t empty.
As an example of how the DCOUNT and DCOUNTA functions work, suppose that you’ve constructed the worksheet shown in Figure 8-10, which contains a list of players on a softball team. Row 1 stores field names: Player, Age, and Batting Average. Rows 2–11 store individual records.
Rows 14 and 15 store the criteria range. Field names go into the first row. Subsequent rows provide labels or values or Boolean logic expressions that the DCOUNT and DCOUNTA functions use to select records from the list for counting. In Figure 8-10, for example, there’s a Boolean expression in cell B15, >8, which tells the function to include only records where the Age shows a value greater than eight. In this case, then, the functions count players on the team who are older than 8.
The DCOUNT function, which appears in cell F3, is
=DCOUNT(A1:C11,C1,A14:C15)
The function counts the players on the team who are older than 8. But because the DCOUNT function looks only at players with a batting average in the Batting Average field, it returns 8. Another way to say this same thing is that in this example, DCOUNT counts the number of players on the team who are older than 8 and have a batting average.
The DCOUNTA function, which appears in cell F5, is
=DCOUNTA(A1:C11,3,A14:C15)
The function counts the players on the team who are older than 8 and have some piece of information entered into the Batting Average field. The function returns the value 9 because each of the players older than 8 have something stored in the Batting Average field. Eight of them, in fact, have batting average values. The fifth player (Christina) has the text label NA.
=DCOUNT(A1:C11,A14:C15)
=DCOUNTA(A1:C11,A14:C15)
Note: To omit an argument, you just leave the space between the two commas empty.
The DGET function retrieves a value from a database list according to selection criteria. The function uses the following syntax:
=DGET(database,field,criteria)
where database is a range reference to the Excel table that holds the value you want to extract, field tells Excel which column in the database to extract, and criteria is a range reference that identifies the fields and values used to define your selection criteria. The field argument can be a cell reference holding the field name, the field name enclosed in quotation marks, or a number that identifies the column (1 for the first column, 2 for the second column, and so on).
Go back to the softball players list example in the preceding section. Suppose that you want to find the batting average of the single eight-year-old player. To retrieve this information from the list shown in Figure 8-11, enter the following formula into cell F3:
=DGET(A1:C11,3,A14:C15)
This function returns the value 0.444 because that’s the eight-year-old’s batting average.
The DMAX and DMIN functions find the largest and smallest values, respectively, in a database list field that match the criteria that you specify. Both functions use the same syntax, as shown here:
=DMAX(database,field,criteria)
=DMIN(database,field,criteria)
where database is a range reference to the Excel table, field tells Excel which column in the database to look in for the largest or smallest value, and criteria is a range reference that identifies the fields and values used to define your selection criteria. The field argument can be a cell reference holding the field name, the field name enclosed in quotation marks, or a number that identifies the column (1 for the first column, 2 for the second column, and so on).
As an example of how the DMAX and DMIN functions work, suppose you construct a list of your friends and some important statistical information, including their typical golf scores and their favorite local courses, as shown in Figure 8-12. Row 1 stores field names: Friend, Golf Score, and Course. Rows 2–11 store individual records.
Rows 14 and 15 store the criteria range. Field names go into the first row. Subsequent rows provide labels or values or Boolean logic expressions that the DMAX and DMIN functions use to select records from the list for counting. In Figure 8-12, for example, note the text label in cell C15, Snohomish, which tells the function to include only records where the Course field shows the label Snohomish.
The DMAX function, which appears in cell F3, is
=DMAX(A1:C11,"Golf Score",A14:C15)
The function finds the highest golf score of the friends who favor the Snohomish course, which happens to be 98.
The DMIN function, which appears in cell F5, is
=DMIN(A1:C11,"Golf Score",A14:C15)
The function counts the lowest score of the friends who favor the Snohomish course, which happens to be 96.
The DPRODUCT function is weird. And I’m not sure why you would ever use it. Oh sure, I understand what it does. The DPRODUCT function multiplies the values in fields from a database list based on selection criteria. I just can’t think of a general example about why you would want to do this.
The function uses the syntax
=DPRODUCT(database,field,criteria)
where database is a range reference to the Excel table that holds the value you want to multiply, field tells Excel which column in the database to extract, and criteria is a range reference that identifies the fields and values used to define your selection criteria. If you’ve been reading this chapter from the very start, join the sing-along: The field argument can be a cell reference holding the field name, the field name enclosed in quotation marks, or a number that identifies the column (1 for the first column, 2 for the second column, and so on).
I can’t construct a meaningful example of why you would use this function, so no worksheet example this time. Sorry.
Note: Just so you don’t waste time looking, the Excel Help file doesn’t provide a good example of the DPRODUCT function either.
The DSTDEV and DSTDEVP functions calculate a standard deviation. DSTDEV calculates the standard deviation for a sample. DSTDEVP calculates the standard deviation for a population. As with other database statistical functions, the unique and truly useful feature of DSTDEV and DSTDEVP is that you can specify that you want only list records that meet the specified criteria you include in your calculations.
The DSTDEV and DSTDEVP functions use the same syntax:
=DSTDEV(database,field,criteria)
=DSTDEVP(database,field,criteria)
where database is a range reference to the Excel table that holds the values for which you want to calculate a standard deviation, field tells Excel which column in the database to use in the calculations, and criteria is a range reference that identifies the fields and values used to define your selection criteria. The field argument can be a cell reference holding the field name, the field name enclosed in quotation marks, or a number that identifies the column (1 for the first column, 2 for the second column, and so on).
As an example of how the DSTDEV function works, suppose you construct the worksheet shown in Figure 8-13. (This is the same basic worksheet as shown in Figure 8-7, in case you’re wondering.)
The worksheet range holds a small list with row 1 storing field names (Name, State, and Donation) and rows 2 through 11 storing individual records.
Rows 14 and 15 store the criteria range. The criteria range typically duplicates the row of field names. The criteria range also includes at least one other row of labels or values or Boolean logic expressions that the DSTDEV and DSTDEVP functions use to select records from the list. In Figure 8-13, for example, note the Boolean expression in cell C15, <250, which tells the function to include only records where the Donation field shows a value less than 250.
The DSTDEV function, which appears in cell F3, is
=DSTDEV(A1:C11,"Donation",A14:C15)
and it returns the sample standard deviation of the donation amounts shown in the database list, excluding the donation from Jeannie in California because that amount is not less than 250. The actual function result is 33.33333.
The DSTDEVP function, which appears in cell F5, is
=DSTDEVP(A1:C11,"Donation",A14:C15)
and returns the population standard deviation of the donation amounts shown in the database list excluding the donation from Jeannie in California because that amount isn’t less than 250. The actual function result is 31.42697.
You wouldn’t, by the way, simply pick one of the two database standard deviation functions willy-nilly. If you’re calculating a standard deviation using a sample, or subset of items, from the entire data set, or population, you use the DSTDEV function. If you’re calculating a standard deviation using all the items in the population, use the DSTDEVP function.
The DSUM function adds values from a database list based on selection criteria. The function uses the syntax:
=DSUM(database,field,criteria)
where database is a range reference to the Excel table, field tells Excel which column in the database to sum, and criteria is a range reference that identifies the fields and values used to define your selection criteria. The field argument can be a cell reference holding the field name, the field name enclosed in quotation marks, or a number that identifies the column (1 for the first column, 2 for the second column, and so on).
Figure 8-14 shows a simple bank account balances worksheet that illustrates how the DSUM function works. Suppose that you want to find the total of the balances that you have in open accounts paying more than 0.02, or 2 percent, interest. The criteria range in A14:D15 provides this information to the function. Note that both criteria appear in the same row. This means that a bank account must meet both criteria in order for its balance to be included in the DSUM calculation.
The DSUM formula appears in cell F3, as shown here:
=DSUM(A1:C11,3,A14:D15)
This function returns the value 39000 because that’s the sum of the balances in open accounts that pay more than 2 percent interest.
The DVAR and DVARP functions calculate a variance, which is another measure of dispersion — and actually, the square of the standard deviation. DVAR calculates the variance for a sample. DVARP calculates the variance for a population. As with other database statistical functions, using DVAR and DVARP enable you to specify that you want only those list records that meet selection criteria included in your calculations.
The DVAR and DVARP functions use the same syntax:
=DVAR(database,field,criteria)
=DVARP(database,field,criteria)
where database is a range reference to the Excel table that holds the values for which you want to calculate a variance, field tells Excel which column in the database to use in the calculations, and criteria is a range reference that identifies the fields and values used to define your selection criteria. The field argument can be a cell reference holding the field name, the field name enclosed in quotation marks, or a number that identifies the column (1 for the first column, 2 for the second column, and so on).
As an example of how the DVAR function works, suppose you’ve constructed the worksheet shown in Figure 8-15. (Yup, this is the same worksheet as shown in Figure 8-12.)
The worksheet range holds a small list with row 1 storing field names and rows 2–11 storing individual records.
Rows 14–17 store the criteria, which stipulate that you want to include golfing buddies in the variance calculation if their favorite courses are Snohomish, Snoqualmie, or Carnation. The first row, row 14, duplicates the row of field names. The other rows provide the labels or values or Boolean logic expressions — in this case, just labels — that the DVAR and DVARP functions use to select records from the list.
The DVAR function, which appears in cell F3, is
=DVAR(A1:C11,"Golf Score",A14:C17)
and it returns the sample variance of the golf scores shown in the database list for golfers who golf at Snohomish, Snoqualmie, or Carnation. The actual function result is 161.26786.
The DVARP function, which appears in cell F5, is
=DVARP(A1:C11,"Golf Score",A14:C17)
and it returns the population variance of the golf scores shown in the database list for golfers who golf at Snohomish, Snoqualmie, and Carnation. The actual function result is 141.10938.
As when making standard deviation calculations, you don’t simply pick one of the two database variances based on a whim, the weather outside, or how you’re feeling. If you’re calculating a variance using a sample, or subset of items, from the entire data set, or population, you use the DVAR function. To calculate a variance using all the items in the population, you use the DVARP function.
3.137.221.120