Chapter 12 Building inferential statistical formulas

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 12 Building inferential statistical formulas

In this chapter, you will:

Learn about inferential statistics and how they apply to business
Build formulas to show the relationship between two sets of data
Extract sample data from a larger population
Use probabilities to make decisions under uncertainty
Infer characteristics of a population using confidence intervals and hypothesis testing

In Chapter 11, “Building descriptive statistical formulas,” you learned how to measure useful statistical values such as the count, mean, maximum, minimum, rank, and standard deviation. These so-called descriptive statistics tell you a great deal about your data, but business analysis and decision-making require more than just descriptions. As an analyst or manager, you also need to draw conclusions about your data. Fortunately, Excel is up to that challenge by offering many worksheet functions that enable you to construct formulas that help you make inferences, such as whether two sets of data are related, the probability of an observation occurring, or whether there’s enough evidence to reject a hypothesis about some data. This chapter introduces you to these worksheet functions, and you learn even more in Chapter 13, “Applying regression to track trends and make forecasts.”

Understanding inferential statistics

Descriptive statistics consists of measures such as count, sum, mean, rank, and standard deviation that tell you something about a data set. The assumption underlying descriptive statistics is that you’re working with a subset of a larger collection of data. In the language of statisticians, descriptive statistics operate on a sample of some larger population.

Inferential statistics is a set of techniques that enable you to derive conclusions—that is, make inferences—about the entire population based on a sample. It’s important to understand that inferring population characteristics based on a sample is inherently uncertain. Certainty only comes when you measure the population as a whole, such as when the government performs a census. These sorts of large-scale experiments are almost always impractical (too time-consuming and expensive), so samples of the population are observed. That injects uncertainty into the process, but one of the key characteristics of inferential statistics is that it gives you multiple ways to measure that uncertainty.

In statistics, a variable is an aspect of a data set that can be measured in some way (counted, averaged, and so on). The following are all statistical variables:

In company financial data, the revenues generated over a fiscal year
In a database of historical economic data, the annual interest rate
In a series of coin tosses, the collection of flips that turn up “tails”

Much of inferential statistics involves interrogating the relationship between variables such as these. Note, however, that not all variables are inherently interesting, at least from a statistical point of view. For example, a table of invoices might include a Units Ordered column, a Unit Price column, and a Total Price column that’s calculated by multiplying the units ordered by the unit price. In this example, there’s no point analyzing the “relationship” between the units ordered and total price variables because that relationship is purely arithmetic (and hence trivial).

Of more interest to us are three other types of variable:

Independent variable: These are values that are generated without reference to or reliance upon another process. For example, if you’re analyzing revenues per month, the months represent the independent variable.
Dependent variable: These are values that are generated due to some other process. In the revenues per month example, the revenues represent the dependent variable.
Random variable: These are values that are generated by a chance process. For example, the results of a coin toss represent a random variable. Random variables can be either discrete, which means the variable only has a finite number of possible values, or continuous, which means the variable has an infinite number of possible values.

Sampling data

If you have a population data set, you might find that it’s too large or too slow to work with directly. To speed up your work, you can generate a sample from that population and then use the rest of this chapter’s inferential statistics functions and formulas to draw conclusions about the entire population based on your sample. There are two main types of sample you can generate:

Periodic: A sample that consists of every nth observation from the population. If n is 10, for example, the sample would consist of data points 10, 20, 30, and so on.
Random: A sample that consists of n values chosen randomly from the population.

If you have the Analysis ToolPak add-in installed (see Chapter 4’s “Loading the Analysis ToolPak” section), the Data Analysis command (on the Data tab, select Data Analysis) comes with a Sampling tool that can generate either a periodic or a random sample.

The Sampling tool gets the job done, but you can also generate custom samples using formulas. One way to do this is to use Excel’s OFFSET() function, which returns a reference to a cell or range that’s a specified number of rows and columns from a given reference:

Correlation Coefficient	Interpretation
1	The two variables have a direct relationship that is perfectly and positively correlated. For example, a 10% increase in advertising produces a 10% increase in sales.
Between 0 and 1	The two variables have a direct relationship. For example, an increase in advertising leads to an increase in sales. The higher the number, the stronger the direct relationship.
0	There is no relationship between the variables.
Between 0 and –1	The two variables have an inverse relationship. For example, an increase in advertising leads to a decrease in sales. The lower the number, the stronger the inverse relationship.
–1	The variables have an inverse relationship that is perfectly and negatively correlated. For example, a 10% increase in advertising leads to a 10% decrease in sales (and, presumably, a new advertising department).

x_range	A range or array containing the possible observations.
prob_range	A range or array containing the probability values for each observation in x_range.
lower_limit	The lower bound on the range of values for which you want to know the probability.
upper_limit	The upper bound on the range of values for which you want to know the probability; if omitted, the returned value is the probability that an observation equals the lower_limit.

number_s	The number of success observations for which you want to calculate the probability.
trials	A total number of trials in the sample.
probability_s	The probability of getting a success observation in each trial.
cumulative	A logical value that determines how Excel calculates the result. Use FALSE to calculate the probability that the sample has number_s successes; use TRUE to return the cumulative probability that the sample contains at most number_s successes.

sample_s	The number of sample success observations for which you want to calculate the probability.
number_sample	The size of the sample.
population_s	The number of success observations in the population.
number_pop	The size of the population.
cumulative	A logical value that determines how Excel calculates the result. Use FALSE to calculate the probability that the sample has sample_s successes; use TRUE to return the cumulative probability that the sample contains at most sample_s successes.

Table of Contents for Chapter 12 Building inferential statistical formulas

Create new playlist

Sign In

Sign Up

Chapter 12

Building inferential statistical formulas

Understanding inferential statistics

Sampling data

Extracting a periodic sample

Extracting a random sample

Determining whether two variables are related

Calculating covariance

Calculating correlation

Working with probability distributions

Calculating probability

Discrete probability distributions

The binomial distribution

The hypergeometric distribution

The Poisson distribution

Understanding the normal distribution and the NORM.DIST() function

Calculating standard scores

Calculating normal percentiles with NORM.DIST()

The shape of the curve I: The SKEW() function

The shape of the curve II: The KURT() function

Determining confidence intervals

Hypothesis testing

Table of Contents for
Chapter 12 Building inferential statistical formulas

Understanding the normal distribution and the `NORM.DIST()` function

Calculating normal percentiles with `NORM.DIST()`

The shape of the curve I: The `SKEW()` function

The shape of the curve II: The `KURT()` function