Chapter 10
In This Chapter
Using the Descriptive Statistics tool
Creating a histogram
Ranking by percentile
Calculating moving averages
Using the Exponential Smoothing tool
Sampling a population
In this chapter, I describe and discuss the simple descriptive statistical data analysis tools that Excel supplies through the Data Analysis add-in. I also describe some of the really simple-to-use and easy-to-understand inferential statistical tools provided by the Data Analysis add-in — including the tools for calculating moving and exponential averages as well as the tools for generating random numbers and sampling.
The simple-yet-powerful Data Analysis tools can save you a lot of time. With a single command, for example, you can often produce a bunch of descriptive statistical measures such as mean, mode, standard deviation, and so on. What’s more, the other cool tools that you can use for preparing histograms, percentile rankings, and moving average schedules can really come in handy.
Perhaps the best thing about these tools, however, is that even if you’ve had only a little exposure to basic statistics, none of them are particularly difficult to use. All the hard work and all the dirty work gets done by Excel. All you have to do is describe where the input data is.
Note: You must usually install the Data Analysis tools before you can use them. To install them, go to File⇒Options. When Excel displays the Excel Options dialog box, select the Add-Ins item from the left box that appears along the left edge of the dialog box. Excel next displays a list of the possible add-ins — including the Analysis ToolPak add-in. (The Analysis ToolPak is what the Data Analysis tools are called.) Select the Analysis ToolPak item and click Go. Excel displays the Add-Ins dialog box. Select Analysis ToolPak from this dialog box and click OK. Excel installs the Analysis ToolPak add-in.
Perhaps the most common Data Analysis tool that you'll use is the one for calculating descriptive statistics. To see how this works, take a look at the worksheet shown in Figure 10-1. It summarizes sales data for a book publisher. In column A, the worksheet shows the suggested retail price (SRP). In column B, the worksheet shows the units sold of each book through one popular bookselling outlet. You might choose to use the Descriptive Statistics tool to summarize this data set.
To calculate descriptive statistics for the data set shown in Figure 10-1, follow these steps:
Excel displays the Data Analysis dialog box, as shown in Figure 10-2.
Excel displays the Descriptive Statistics dialog box, as shown in Figure 10-3.
To make it easier to see or select the worksheet range, click the worksheet button at the right end of the Input Range text box. When Excel hides the Descriptive Statistics dialog box, select the range that you want by dragging the mouse. Then click the worksheet button again to redisplay the Descriptive Statistics dialog box.
After you describe where the data is and how the statistics should be calculated, click OK. Figure 10-4 shows a new worksheet with the descriptive statistics calculated, added into a new sheet, Sheet 2. Table 10-1 describes the statistics that Excel calculates.
Table 10-1 The Measures That the Descriptive Statistics Tool Calculates
Statistic |
Description |
Mean |
Shows the arithmetic mean of the sample data. |
Standard Error |
Shows the standard error of the data set (a measure of the difference between the predicted value and the actual value). |
Median |
Shows the middle value in the data set (the value that separates the largest half of the values from the smallest half of the values). |
Mode |
Shows the most common value in the data set. |
Standard Deviation |
Shows the sample standard deviation measure for the data set. |
Sample Variance |
Shows the sample variance for the data set (the squared standard deviation). |
Kurtosis |
Shows the kurtosis of the distribution. |
Skewness |
Shows the skewness of the data set’s distribution. |
Range |
Shows the difference between the largest and smallest values in the data set. |
Minimum |
Shows the smallest value in the data set. |
Maximum |
Shows the largest value in the data set. |
Sum |
Adds all the values in the data set together to calculate the sum. |
Count |
Counts the number of values in a data set. |
Largest(X) |
Shows the largest X value in the data set. |
Smallest(X) |
Shows the smallest X value in the data set. |
Confidence Level(X) Percentage |
Shows the confidence level at a given percentage for the data set values. |
Use the Histogram Data Analysis tool to create a frequency distribution and, optionally, a histogram chart. A frequency distribution shows just how values in a data set are distributed across categories. A histogram shows the same information in a cute little column chart. Here’s an example of how all this works — everything will become clearer if you’re currently confused.
To use the Histogram tool, you first need to identify the bins (categories) that you want to use to create a frequency distribution. The histogram plots out how many times your data falls into each of these categories. Figure 10-5 shows the same worksheet as Figure 10-1, only this time with bins information in the worksheet range E1:E12. The bins information shows Excel exactly what bins (categories) you want to use to categorize the unit sales data. The bins information shown in the worksheet range E1:E12, for example, create hundred-unit bins: 0-100, 101-200, 201-300, and so on.
To create a frequency distribution and a histogram using the data shown in Figure 10-5, follow these steps:
Use the Input Range text box to identify the data that you want to use to create a frequency distribution and histogram. If you want to create a frequency distribution and histogram of unit sales data, for example, enter the worksheet range $B$1:$B$38 into the Input Range text box.
To identify the bins that you use for the frequency distribution and histogram, enter the worksheet range that holds the bins into the Bin Range text box. In the case of the example worksheet shown in Figure 10-5, the bin range is $E$1:$E$12.
If your data ranges include labels (as they do in Figure 10-5), select the Labels check box.
Use the Output Options buttons to tell Excel where it should place the frequency distribution and histogram. To place the histogram in the current worksheet, for example, select the Output Range radio button and then enter the range address into its corresponding Output Range text box.
To place the frequency distribution and histogram in a new worksheet, select the New Worksheet Ply radio button. Then, optionally, enter a name for the worksheet into the New Worksheet Ply text box. To place the frequency distribution and histogram information in a new workbook, select the New Workbook radio button.
Make choices from the Output Options check boxes to control what sort of histogram Excel creates. For example, select the Pareto (Sorted Histogram) check box, and Excel sorts bins in descending order. Conversely, if you don't want bins sorted in descending order, leave the Pareto (Sorted Histogram) check box clear.
Selecting the Cumulative Percentage check box tells Excel to plot a line showing cumulative percentages in your histogram.
Optionally, select the Chart Output check box to have Excel include a histogram chart with the frequency distribution. If you don’t select this check box, you don't get the histogram — only the frequency distribution.
Excel creates the frequency distribution and, optionally, the histogram. Figure 10-7 shows the frequency distribution along with a histogram for the workbook data shown in Figure 10-5.
Note: Excel also provides a Frequency function with which you use can use arrays to create a frequency distribution. For more information about how the Frequency function works, see Chapter 9.
The Data Analysis collection of tools includes an option for calculating rank and percentile information for values in your data set. Suppose, for example, that you want to rank the sales revenue information shown in Figure 10-8. To calculate rank and percentile statistics for your data set, take the following steps.
Excel displays the Rank and Percentile dialog box, as shown in Figure 10-9.
Enter the worksheet range that holds the data into the Input Range text box of the Ranks and Percentile dialog box.
To indicate how you have arranged data, select one of the two Grouped By radio buttons: Columns or Rows. To indicate whether the first cell in the input range is a label, select or deselect the Labels In First Row check box.
Select one of the three Output Options radio buttons to specify where Excel should place the rank and percentile information.
Excel creates a ranking like the one shown in Figure 10-10.
The Data Analysis command also provides a tool for calculating moving and exponentially smoothed averages. Suppose, for sake of illustration, that you’ve collected daily temperature information like that shown in Figure 10-11. You want to calculate the three-day moving average — the average of the last three days — as part of some simple weather forecasting. To calculate moving averages for this data set, take the following steps.
Excel displays the Moving Average dialog box, as shown in Figure 10-12.
Click in the Input Range text box of the Moving Average dialog box. Then identify the input range, either by typing a worksheet range address or by using the mouse to select the worksheet range.
Your range reference should use absolute cell addresses. An absolute cell address precedes the column letter and row number with $ signs, as in $A$1:$A$10.
If the first cell in your input range includes a text label to identify or describe your data, select the Labels in First Row check box.
You can calculate a moving average using any number of values. By default, Excel uses the most recent three values to calculate the moving average. To specify that some other number of values be used to calculate the moving average, enter that value into the Interval text box.
Use the Output Range text box to identify the worksheet range into which you want to place the moving average data. In the worksheet example shown in Figure 10-11, for example, I place the moving average data into the worksheet range B2:B10. (See Figure 10-12.)
If you want a chart that plots the moving average information, select the Chart Output check box.
If you want to calculate standard errors for the data, select the Standard Errors check box. Excel places standard error values next to the moving average values. (In Figure 10-11, the standard error information goes into C2:C10.)
Excel calculates moving average information, as shown in Figure 10-13.
Note: If Excel doesn't have enough information to calculate a moving average for a standard error, it places the error message #N/A into the cell. In Figure 10-13, you can see several cells that show this error message as a value.
The Exponential Smoothing tool also calculates the moving average. However, exponential smoothing weights the values included in the moving average calculations so that more recent values have a bigger effect on the average calculation and old values have a lesser effect. This weighting is accomplished through a smoothing constant.
To illustrate how the Exponential Smoothing tool works, suppose that you’re again looking at the average daily temperature information. (I repeat this worksheet in Figure 10-14.)
To calculate weighted moving averages using exponential smoothing, take the following steps:
Excel displays the Exponential Smoothing dialog box, as shown in Figure 10-15.
To identify the data for which you want to calculate an exponentially smoothed moving average, click in the Input Range text box. Then identify the input range, either by typing a worksheet range address or by selecting the worksheet range. If your input range includes a text label to identify or describe your data, select the Labels check box.
Enter the smoothing constant value in the Damping Factor text box. The Excel Help file suggests that you use a smoothing constant of between 0.2 and 0.3. Presumably, however, if you’re using this tool, you have your own ideas about what the correct smoothing constant is. (If you’re clueless about the smoothing constant, perhaps you shouldn't be using this tool.)
Use the Output Range text box to identify the worksheet range into which you want to place the moving average data. In the worksheet example shown in Figure 10-14, for example, you place the moving average data into the worksheet range B2:B10.
To chart the exponentially smoothed data, select the Chart Output check box.
To calculate standard errors, select the Standard Errors check box. Excel places standard error values next to the exponentially smoothed moving average values.
Excel calculates moving average information, as shown in Figure 10-16.
The Data Analysis command also includes a Random Number Generation tool. The Random Number Generation tool is considerably more flexible than the =Rand() function, which is the other tool that you have available within Excel to produce random numbers. The Random Number Generation tool isn’t really a tool for descriptive statistics. You would probably typically use the tool to help you randomly sample values from a population, but I describe it here in this chapter, anyway, because it works like the other descriptive statistics tools.
To produce random numbers, take the following steps:
Excel displays the Data Analysis dialog box.
Excel displays the Random Number Generation dialog box, as shown in Figure 10-17.
Use the Number of Variables text box to specify how many columns of values you want in your output range. Similarly, use the Number of Random Numbers text box to specify how many rows of values you want in the output range.
You don't absolutely need to enter values into these two text boxes, by the way. You can also leave them blank. In this case, Excel fills all the columns and all the rows in the output range.
Select one of the distribution methods from the Distribution drop-down list. The Distribution drop-down list provides several distribution methods: Uniform, Normal, Bernoulli, Binomial, Poisson, Patterned, and Discrete. Typically, if you want a pattern of distribution other than Uniform, you'll know which one of these distribution methods is appropriate. For example, if you want to pull random numbers from a data set that's normally distributed, you might select the Normal distribution method.
If you select a distribution method that requires parameters, or input values, use the Parameters text box (Value and Probability Input Range) to identify the worksheet range that holds the parameters needed for the distribution method.
You have the option of entering a value that Excel will use to start its generation of random numbers. The benefit of using a Random Seed value, as Excel calls it, is that you can later produce the same set of random numbers by planting the same “seed.”
Use the Output Options radio buttons to select the location that you want for random numbers.
Excel generates the random numbers.
One other data analysis tool — the Sampling tool — deserves to be discussed someplace. I describe it here, even if it doesn’t fit perfectly.
Truth be told, both the Random Number Generation tool (see the preceding section) and the Sampling tool are probably what you would use while preparing to perform inferential statistical analysis of the sort that I describe in Chapter 11. But because these tools work like (and look like) the other descriptive statistics tools, I describe them here.
With the Sampling tool that's part of the Data Analysis command, you can randomly select items from a data set or select every nth item from a data set. For example, suppose that as part of an internal audit, you want to randomly select five titles from a list of books. To do so, you could use the Sampling tool. For purposes of this discussion, pretend that you’re going to use the list of books and book information shown in Figure 10-18.
To sample items from a worksheet like the ones shown in Figure 10-18, take the following steps:
Excel displays the Sampling dialog box, as shown in Figure 10-19.
Use the Input Range text box to describe the worksheet range that contains enough data to identify the values in the data set. For example, in the case of the data set like the one shown in Figure 10-18, the information in column A — TitleID — uniquely identifies items in the data set. Therefore, you can identify (or uniquely locate) items using the input range A1:A38. You can enter this range into the Input Range text box either by directly typing it or by clicking in the text box and then dragging the cursor from cell A1 to cell A38.
If the first cell in the input range holds the text label that describes the data — this is the case in Figure 10-18 — select the Labels check box.
Excel provides two sampling methods for retrieving or identifying items in your data set:
Select from the three radio buttons in the Output Options area to select where the sampling result should appear. To put sampling results into an output range in the current worksheet, select the Output Range radio button and then enter the output range into the text box provided. To store the sampling information in a new worksheet or on a new workbook, select either the New Worksheet Ply or the New Workbook radio button.
Note that Excel grabs item information from the input range. For example, Figure 10-20 shows the information that Excel places on a new worksheet if you use periodic sampling and grab every fifth item. Figure 10-21 shows how Excel identifies the sample if you randomly select five items. Note that the values shown in both Figures 10-20 and 10-21 are the title ID numbers from the input range.
18.224.57.16