Chapter 13

Applying regression to track trends and make forecasts

In this chapter, you will:

  • Understand regression analysis

  • Learn how to use simple regression on linear data

  • Plot and calculate best-fit trendlines

  • Plot and calculate forecasted values

  • Learn how to use simple regression on nonlinear data

  • Understand and use multiple regression analysis

In these complex and uncertain times, forecasting business performance is increasingly important. Today, more than ever before, managers at all levels need to make intelligent predictions of future sales and profit trends as part of their overall business strategy. By forecasting sales six months, a year, or even three years down the road, managers can anticipate related needs such as employee acquisitions, warehouse space, and raw material requirements. Similarly, a profit forecast enables a company to plan for its future expansion.

Business forecasting has been around for many years, and various methods have been developed—some of them more successful than others. The most common forecasting method is the qualitative “seat of the pants” approach, in which a manager (or a group of managers) estimates future trends based on experience and knowledge of the market. This method, however, suffers from an inherent subjectivity and a short-term focus because many managers tend to extrapolate from recent experience and ignore the long-term trend. Other methods (such as averaging past results) are more objective but generally are useful for forecasting only a few months in advance.

This chapter presents a technique called regression analysis. Regression is a powerful statistical procedure that has become a popular business tool. In its general form, you use regression analysis to determine the relationship between one phenomenon and another. For example, car sales might be dependent on interest rates, and units sold might be dependent on the amount spent on advertising. The dependent phenomenon is called the dependent variable, or the y-value, and the phenomenon upon which it’s dependent is called the independent variable, or the x-value. (Think of a chart or graph on which the independent variable is plotted along the horizontal [x] axis and the dependent variable is plotted along the vertical [y] axis.)

Given these variables, you can do two things with regression analysis:

  • Determine the relationship between the known x- and y-values and use the results to calculate and visualize the overall trend of the data.

  • Use the existing trend to forecast new y-values.

As you’ll see in this chapter, Excel is well stocked with tools that enable you to both calculate the current trend and make forecasts no matter what type of data you’re dealing with.

Choosing a regression method

Three methods of regression analysis are used most often in business:

  • Simple regression: Use this type of regression when you’re dealing with only one independent variable. For example, if the dependent variable is car sales, the independent variable might be interest rates. You also need to decide whether your data is linear or nonlinear:

    • Linear means that if you plot the data on a chart, the resulting data points resemble (roughly) a line.

    • Nonlinear means that if you plot the data on a chart, the resulting data points form a curve.

  • Polynomial regression: Use this type of regression when you’re dealing with only one independent variable, but the data fluctuates in such a way that the pattern in the data doesn’t resemble either a straight line or a simple curve.

  • Multiple regression: Use this type of regression when you’re dealing with more than one independent variable. For example, if the dependent variable is car sales, the independent variables might be interest rates and disposable income.

You’ll learn about all three methods in this chapter.

Using simple regression on linear data

With linear data, the dependent variable is related to the independent variable by some constant factor. For example, you might find that car sales (the dependent variable) increase by 1 million units whenever interest rates (the independent variable) decrease by 1%. Similarly, you might find that division revenue (the dependent variable) increases by $100,000 for every $10,000 you spend on advertising (the independent variable).

Analyzing trends using best-fit lines

You make these sorts of determinations by examining the trend underlying the current data you have for the dependent variable. In linear regression, you analyze the current trend by calculating the line of best fit, or the trendline. This is a line through the data points for which the differences between the points above and below the line cancel each other out (more or less).

Image Note

Excel includes a tool called Forecast Sheet that simplifies many of the tasks and calculations that I present in this chapter. To use it, select your data, select the Data tab, and then select Forecast Sheet. This opens the Create Forecast Worksheet dialog box, which shows a basic best-fit trendline. You can select Options and change the default settings to gain a bit more control over the result. Select Create to add the new forecasting worksheet.

Plotting a best-fit trendline

The easiest way to see the best-fit line is to use a chart. Note, however, that this works only if your data is plotted using an XY (scatter) chart. For example, Figure 13-1 shows a worksheet with quarterly sales figures plotted on an XY chart. Here, the quarterly sales data is the dependent variable, and the period is the independent variable. (In this example, the independent variable is just time, represented, in this case, by fiscal quarters.) You can add a trendline through the plotted points.

The figure shows an Excel worksheet with quarterly sales data plotted as an XY chart.
FIGURE 13-1 To see a trendline through your data, first make sure the data is plotted using an XY chart.

The following steps show you how to add a trendline to a chart:

  1. Select the chart and, if more than one data series are plotted, select the series you want to work with.

  2. Select Design > Add Chart Element > Trendline > More Trendline Options. Excel displays the Format Trendline task pane, shown in Figure 13-2.

    The figure shows the Format Trendline task pane showing the Trendline Options radio button, with the Linear radio button selected.
    FIGURE 13-2 In the Format Trendline task pane, use the Trendline Options tab to select the type of trendline you want to see.
  3. On the Trendline Options tab, select Linear.

  4. Select the Display Equation On Chart check box. (See the next section, “Understanding the regression equation.”)

  5. Select the Display R-Squared Value On Chart check box. (See “Understanding R2,” later in this chapter.)

  6. Select Close (X). Excel inserts the trendline. Note that you might need to drag the regression equation text box away from the trendline to read the values.

Figure 13-3 shows the best-fit trendline added to the chart.

The figure shows an Excel worksheet with the quarterly sales chart that includes a best-fit trendline.
FIGURE 13-3 This quarterly sales chart has an added best-fit trendline.
Understanding the regression equation

In the steps outlined in the previous section, I instructed you to select the Display Equation On Chart check box. Doing this displays the regression equation on the chart, as pointed out in Figure 13-3. This equation is crucial to regression analysis because it gives you a specific formula for the relationship between the dependent variable and the independent variable.

For linear regression, the best-fit trendline is a straight line with an equation that takes the following form:

y = mx + b

Here’s how you can interpret this equation with respect to the quarterly sales data:

y

This is the dependent variable, so it represents the trendline value (quarterly sales) for a specific period.

x

This is the independent variable, which, in this example, is the period (quarter) you’re working with.

m

This is the slope of the trendline. In other words, it’s the amount by which the sales increase per period, according to the trendline.

b

This is the y-intercept, which means that it’s the starting value for the trend.

Here’s the regression equation for the example (refer to Figure 13-3):

y = 1407.6x + 259800

To determine the first point on the trendline, substitute 1 for x:

y = 1407.6 * 1 + 259800

The result is 261,207.6.

Image Caution

It’s important not to view the trendline values as somehow trying to predict or estimate the actual y-values (sales). The trendline simply gives you an overall picture of how the y-values change when the x-values change.

Understanding R2

When you select the Display R-Squared Value On Chart check box when adding a trendline, Excel places the following on the chart:

R2 = n

Here, n is called the coefficient of determination (which statisticians abbreviate as r2, and Excel abbreviates as R2). This is actually the square of the correlation; as I discuss in Chapter 12, “Building inferential statistical formulas,” the correlation tells you something about how well two things are related to each other. In this context, R2 gives you some idea of how well the trendline fits the data. Roughly, it tells you the proportion of the variance in the dependent variable that is associated with the independent variable. Generally, the closer the result is to 1, the better the fit. Values below about 0.7 mean that the trendline is not a very good fit for the data.

Image Tip

If you don’t get a good fit with a linear trendline, your data might not be linear. Try using a different trendline type to see if you can increase the value of R2.

You’ll see in the next section that it’s possible to calculate values for the best-fit trendline. Having those values enables you to calculate the correlation between the known y-values and the generated trend values by using the CORREL() function, which I describe in Chapter 12. Note that squaring the CORREL() result gives you the value of R2.

Calculating best-fit values using TREND()

The problem with using a chart best-fit trendline is that you don’t get actual values to work with. If you want to get some values on the worksheet, you can calculate individual trendline values using the regression equation. However, what if the underlying data changes? For example, those values might be estimates, or they might change as more accurate data comes in. In that case, you need to delete the existing trendline, add a new one, and then recalculate the trend values based on the new equation.

If you need to work with worksheet trend values, you can avoid having to perform repeated trendline analyses by calculating the values using Excel’s TREND() function:

TREND(known_y's[, known_x's][, new_x's][, const])

known_y’s

A range reference or an array of the known y-values—such as the historical values—from which you want to calculate the trend.

known_x’s

A range reference or an array of the x-values associated with the known y-values. If you omit this argument, the known_x’s are assumed to be the array {1,2,3,...,n}, where n is the number of known_y’s.

new_x’s

A range reference or an array of the new x-values for which you want corresponding y-values.

const

A logical value that determines where Excel places the y-intercept. If you use FALSE, the y-intercept is placed at 0; if you use TRUE (this is the default), Excel calculates the y-intercept based on the known_y’s.

To generate the best-fit trend values, you need to specify the known_y’s argument and, optionally, the known_x’s argument. In the quarterly sales example, the known y-values are the actual sales numbers, which lie in the range D2:D13. The known x-values are the period numbers in the range C2:C13. Therefore, to calculate the best-fit trend values, you select a range that is the same size as the known values and enter the following formula as an array:

=TREND(D2:D13, C2:C13)

Figure 13-4 shows the results of this TREND() array formula in column F. For comparison purposes, the sheet also includes the trend values (in column E) generated using the regression equation from the chart trendline shown in Figure 13-3. (Note that some of the values are slightly off. That’s because the values for the slope and intercept shown in the regression equation have been rounded off for display in the chart.)

The figure shows an Excel worksheet with cell F2 selected, which displays a TREND() function that is part of an array.
FIGURE 13-4 Best-fit trend values (F2:F13) created with the TREND() function.

Image Tip

In the previous section, I mention that you can determine the correlation between the known dependent values and the calculated trend values by using the CORREL() function. Here’s an array formula that provides a shorthand method for returning the correlation:

=CORREL(array1, TREND(known_y's, known_x's))
Calculating best-fit values using LINEST()

Using TREND() is the most direct way to calculate trend values, but Excel offers a second method that calculates the trendline’s slope and y-intercept. You can then plug these values into the general linear regression equation—y = mx + b—as m and b, respectively. You calculate the slope and y-intercept by using the LINEST() function:

LINEST(known_y's[, known_x's][, const][, stats])

known_y’s

A range reference or an array of the known y-values from which you want to calculate the trend.

known_x’s

A range reference or an array of the x-values associated with the known y-values. If you omit this argument, the known_x’s are assumed to be the array {1,2,3,...,n}, where n is the number of known_y’s.

const

A logical value that determines where Excel places the y-intercept. If you use FALSE, the y-intercept is placed at 0; if you use TRUE (this is the default), Excel calculates the y-intercept based on the known_y’s.

stats

A logical value that determines whether LINEST() returns additional regression statistics besides the slope and intercept. The default is FALSE.

When you use LINEST() without the stats argument, the function returns a 1×2 array, where the value in the first column is the slope of the trendline and the value in the second column is the intercept. For example, the following formula, entered as a 1×2 array, returns the slope and intercept of the quarterly sales trendline:

=LINEST(D2:D13, C2:C13)

In Figure 13-5, the returned array values are shown in cells H2 and I2. This worksheet also uses these values to compute the trendline values by substituting $H$2 for m and $I$2 for b in the linear regression equation. For example, the following formula calculates the trend value for period 1:

=$H$2 * C2 + $I$2
The figure shows an Excel worksheet with cell F2 selected, which displays a formula that implements the linear regression equation: y = mx + b, where m is the slope and b is the y-intercept. Both values are calculated using the LINEST() function in cells H2 and I2, respectively.
FIGURE 13-5 This worksheet shows the best-fit trend values (F2:F13) created with the results of the LINEST() function (H2:I2) plugged into the linear regression equation.

If you set the stats argument to TRUE, the LINEST() function returns 10 regression statistics in a 5×2 array. The returned statistics are listed in Table 13-1, and Figure 13-6 shows an example of the returned array.

TABLE 13-1 Regression statistics returned by LINEST() when the stats argument is set to TRUE

Array Location

Statistic

Description

Row 1, column 1

m

The slope of the trendline

Row 1, column 2

b

The y-intercept of the trendline

Row 2, column 1

se

The standard error value for m

Row 2, column 2

seb

The standard error value for b

Row 3, column 1

R2

The coefficient of determination

Row 3, column 2

sey

The standard error value for the y estimate

Row 4, column 1

F

The F statistic

Row 4, column 2

df

The degrees of freedom

Row 5, column 1

ssreg

The regression sum of squares

Row 5, column 2

ssresid

The residual sum of squares

The figure shows an Excel worksheet with the array of regression statistics returned by the LINEST() function when the stats argument is set to TRUE.
FIGURE 13-6 The range H5:I9 contains the array of regression statistics returned by LINEST() when its stats argument is set to TRUE.

Image Note

These and other regression statistics are available via the Analysis ToolPak’s Regression tool. Assuming that the Analysis ToolPak add-in is installed (see “Loading the Analysis ToolPak” in Chapter 4, “Understanding functions”), select Data > Data Analysis, select Regression, and then select OK. Use the Regression dialog box to specify the ranges for the y-values and x-values and to select which statistics you want to see in the output.

Most of these values are beyond the scope of this book. However, notice that one of the returned values is R2, the coefficient of determination, which tells how well the trendline fits the data. If you want just this value from the LINEST() array, use this formula (see cell I11 in Figure 13-6):

=INDEX(LINEST(known_y's, known_x's, , TRUE), 3, 1)

Image Note

You can also calculate the slope, intercept, and R2 value directly by using the following functions:

Click here to view code image

SLOPE(known_y's, known_x's)
INTERCEPT(known_y's, known_x's)
RSQ(known_y's, known_x's)

The syntax for these functions is the same as that of the first two arguments of the TREND() function, except that the known_x’s argument is required. Here’s an example:

=RSQ(D2:D13, C2:C13)
Analyzing the sales versus advertising trend

We tend to think of trend analysis as having a time component. That is, when we think about looking for a trend, we usually think about finding a pattern over a period of time. But regression analysis is more versatile than that. You can use it to compare any two phenomena, as long as one is dependent on the other in some way.

For example, it’s reasonable to assume that there is some relationship between how much you spend on advertising and how much you sell. In this case, the advertising costs are the independent variable, and the sales revenues are the dependent variable. You can apply regression analysis to investigate the exact nature of the relationship.

Figure 13-7 shows a worksheet that does this. The advertising costs are in A2:A13, and the sales revenues over the same period (these could be monthly numbers, quarterly numbers, and so on—the time period doesn’t matter) are in B2:B13. The rest of the worksheet applies the same trend-analysis techniques that you’ve learned in the past few sections.

The figure shows an Excel worksheet with a regression analysis for a table of sales and advertising expenses data.
FIGURE 13-7 This figure shows a trend analysis for advertising costs versus sales revenues.

Making forecasts

Knowing the overall trend exhibited by a data set is useful because it tells you the broad direction that sales or costs or employee acquisitions is going, and it gives you a good idea of how related the dependent variable is to the independent variable. But a trend is also useful for making forecasts in which you extend the trendline into the future (what will sales be in the first quarter of next year?) or calculate the trend value given some new independent value (if we spend $25,000 on advertising, what will the corresponding sales be?).

How accurate is such a prediction? A projection based on historical data assumes that the factors influencing the data over the historical period will remain constant. If this is a reasonable assumption in your case, the projection will be a reasonable one. Of course, the longer you extend the line, the more likely it is that some of the factors will change or that new ones will arise. As a result, best-fit extensions should be used only for short-term projections.

Plotting forecasted values

If you want just a visual idea of a forecasted trend, you can extend the chart trendline that you created earlier. The following steps show you how to add a forecasting trendline to a chart:

  1. Select the chart and, if more than one data series are plotted, click the series you want to work with.

  2. Select Design > Add Chart Element > Trendline > More Trendline Options to display the Format Trendline task pane.

  3. On the Trendline Options tab, select Linear.

  4. Select the Display Equation On Chart check box. (See “Understanding the regression equation,” earlier in this chapter.)

  5. Select the Display R-Squared Value On Chart check box. (See “Understanding R2,” earlier in this chapter.)

  6. Use the Forward text box to select the number of units you want to project the trendline into the future. (For example, to extend the quarterly sales number into the next year, set Forward to 4 to extend the trendline by four quarters.)

  7. Select Close (X). Excel inserts the trendline and extends it into the future.

Figure 13-8 shows the quarterly sales trendline extended by four quarters.

The figure shows an Excel worksheet with a trendline that has been extended four quarters into the future.
FIGURE 13-8 The trendline has been extended four quarters into the future.
Extending a linear trend with the fill handle

If you prefer to see exact data points in your forecast, you can use the fill handle to project a best-fit line into the future. Here are the steps to follow:

  1. Select the historical data on the worksheet.

  2. Click and drag the fill handle to extend the selection. Excel calculates the best-fit line from the existing data, projects this line into the new data, and calculates the appropriate values.

Figure 13-9 shows an example. Here, I’ve used the fill handle to project the period numbers and quarterly sales figures over the next fiscal year. The accompanying chart clearly shows the extended best-fit values.

The figure shows an Excel worksheet with a linear trend that has been extended using the fill handle.
FIGURE 13-9 When you use the fill handle to extend historical data into the future, Excel uses a linear projection to calculate the new values.
Extending a linear trend using the Series command

You also can use the Series command to project a best-fit line. The following steps show you how it’s done:

  1. Select the range that includes both the historical data and the cells that will contain the projections (and ensure that the projection cells are blank).

  2. Select Home > Fill > Series. Excel displays the Series dialog box.

  3. Select the AutoFill option.

  4. Select OK. Excel fills in the blank cells with the best-fit projection.

The Series command is also useful for producing the data that defines the full best-fit line so that you can see the actual trendline values. The following steps show you how it’s done:

  1. Copy the historical data into an adjacent row or column.

  2. Select the range that includes both the copied historical data and the cells that will contain the projections. (Again, ensure that the projection cells are blank).

  3. Select Home > Fill > Series. Excel displays the Series dialog box.

  4. Select the Trend check box.

  5. Select the Linear option.

  6. Select OK. Excel replaces the copied historical data with the best-fit numbers and projects the trend onto the blank cells.

In Figure 13-10, the trend values created by the Series command are in E2:E17 and are plotted on the chart with the best-fit line on top of the historical data.

The figure shows an Excel worksheet with a linear trend that has been extended using the series command.
FIGURE 13-10 This worksheet shows a best-fit trendline created with the Series command.
Forecasting with the regression equation

You can also forecast individual dependent values by using the regression equation that is returned when you add the chart trendline. (Remember that you must click the Display Equation on Chart check box when adding the trendline.) Recall the general regression equation for a linear model:

y = mx + b

The regression equation displayed by the trendline feature gives you the m and b values, so to determine a new value for y, just plug in a new value for x.

For example, in the quarterly sales model, Excel calculated the following regression equation:

y = 1407.6x + 259800

To find the trend value for the thirteenth period, you substitute 13 for x:

y = 1407.6 * 13 + 259800

The result is 278,099, the projected sales for the thirteenth period (first quarter 2019).

Forecasting with TREND()

The TREND() function is also capable of forecasting new values. To extend a trend and generate new values, you need to add the new_x’s argument to the TREND() function. Here’s the basic procedure for setting this up on the worksheet:

  1. Add the new x-values to the worksheet. For example, to extend the quarterly sales trend into the next fiscal year, you’d add the values 13 through 16 to the Period column.

  2. Select a range large enough to hold all the new values. For example, if you’re adding four new values, select four cells in a column or row, depending on the structure of your data.

  3. Enter the TREND() function as an array formula, specifying the range of new x-values as the new_x’s argument. Here’s the array formula for the quarterly sales example:

    =TREND(D2:D13, C2:C13, C14:C17)

Figure 13-11 shows the forecasted values in F14:F17. The values in column E were derived using the regression equation and are included for comparison.

The figure shows an Excel worksheet with a linear trend that has been extended using the TREND() function.
FIGURE 13-11 The range F14:F17 contains the forecasted values calculated by the TREND() function.
Forecasting with LINEST()

Recall that the LINEST() function returns the slope and y-intercept of the trendline. When you know these numbers, forecasting new values is a straightforward matter of plugging them into the linear regression equation along with a new value of x. For example, if the slope is in cell H2, the intercept is in I2, and the new x-value is in C14, the following formula will return the forecasted value:

=$H$2 * C14 + $I$2

Figure 13-12 shows a worksheet that uses this method to forecast the Fiscal 2019 sales figures.

The figure shows an Excel worksheet with a linear trend that has been extended using the LINEST() function.
FIGURE 13-12 The range F14:F17 contains the forecasted values calculated by the regression equation, using the slope (H2) and intercept (I2) returned by the LINEST() function.

Image Note

You also can calculate a forecasted value for x by using the FORECAST() function:

Click here to view code image

FORECAST(x, known_y's, known_x's)

Here, x is the new x-value that you want to work with, and known_y’s and known_x’s are the same as with the TREND() function (except that the known_x’s argument is required). Here’s an example:

=FORECAST(13, D2:D13, C2:C13)

Using simple regression on nonlinear data

As you saw in the case study, the data you work with doesn’t always fit a linear pattern. If the data shows seasonal variations, you can compute the trend and forecast values by working with seasonally adjusted numbers, which you also saw in the case study. But many business scenarios aren’t either linear or seasonal. The data might look more like a curve, or it might fluctuate without any apparent pattern.

These nonlinear patterns might seem more complex, but Excel offers a number of useful tools for performing regression analysis on this type of data.

Working with an exponential trend

An exponential trend is a trend that rises or falls at an increasingly higher rate. Fads often exhibit this kind of behavior. A product might sell steadily and unspectacularly for a while, but then word starts getting around—perhaps because of a mention in the newspaper or on television—and sales start to rise. If these new customers enjoy the product, they tell their friends about it, and those people purchase the product, too. They tell their friends, the media notice that everyone’s talking about this product, and a bona fide fad ensues.

This is called an exponential trend because, as a graph, it looks much like a number being raised to successively higher values of an exponent (for example, 101, 102, 103, and so on). This is often modeled using the constant e (approximately 2.71828), which is the base of the natural logarithm. Figure 13-20 shows a worksheet that uses the EXP() function in column B to return e raised to the successive powers in column A. The chart shows the results as a classic exponential curve.

The figure shows an Excel worksheet that uses the EXP() function in column B to return e raised to the successive powers in column A. A chart shows that the results graph as an exponential curve.
FIGURE 13-20 Raising the constant e to successive powers produces a classic exponential trend pattern.

Figure 13-21 shows a worksheet that contains weekly data for the number of units sold of a product. As you can see, the unit sales hold steady for the first eight or nine weeks and then climb rapidly. As this chart illustrates, the sales curve is very much like an exponential growth curve. The next couple sections show you how to track the trend and make forecasts based on such a model.

An Excel worksheet with a chart of weekly unit sales.
FIGURE 13-21 This chart of weekly unit sales shows a definite exponential pattern.
Plotting an exponential trendline

The easiest way to see the trend and forecast is to add a trendline—specifically, an exponential trendline—to the chart. Here are the steps to follow:

  1. Select the chart and, if more than one data series are plotted, click the series you want to work with.

  2. Select Design > Add Chart Element > Trendline > More Trendline Options to display the Format Trendline task pane.

  3. On the Trendline Options tab, select the Exponential option.

  4. Select the Display Equation on Chart and Display R-Squared Value on Chart check boxes.

  5. Select Close (X). Excel inserts the trendline.

Figure 13-22 shows the exponential trendline added to the chart.

The figure shows the chart of weekly unit sales with an exponential trendline added.
FIGURE 13-22 This figure shows the weekly unit sales chart with an exponential trendline added.
Calculating exponential trend and forecast values

In Figure 13-22, notice that the regression equation for an exponential trendline takes the following general form:

y = bemx

Here, b and m are constants. So, knowing these values, given an independent value x, you can compute its corresponding point on the trendline using the following formula:

=b * EXP(m * x)

In the trendline of Figure 13-22, these constant values are 7.1875 and 0.4038, respectively. So, the formula for trend values becomes this:

=7.1875 * EXP(0.4038 * x)

If x is a value between 1 and 18, you get a trend point for the existing data. To get a forecast, you use a value higher than 18. For example, using x equal to 19 gives a forecast value of 15,437 units:

=7.1875 * EXP(0.4038 * 19)
Exponential trending and forecasting using the GROWTH() function

As you learned with linear regression, it’s often useful to work with actual trend values instead of just visualizing the trendline. With a linear model, you use the TREND() function to generate actual values. The exponential equivalent is the GROWTH() function:

GROWTH(known_y's[, known_x's][, new_x's][, const])

known_y’s

A range reference or an array of the known y-values.

known_x’s

A range reference or an array of the x-values associated with the known y-values. If you omit this argument, the known_x’s are assumed to be the array {1,2,3,...,n}, where n is the number of known_y’s.

new_x’s

A range reference or an array of the new x-values for which you want corresponding y-values.

const

A logical value that determines the value of the b constant in the exponential regression equation. If you use FALSE, b is set to 1; if you use TRUE (which is the default), Excel calculates b based on the known_y’s.

With the exception of a small difference in the const argument, the GROWTH() function syntax is identical to that of TREND(). You use the two functions in the same way as well. For example, to return the exponential trend values for the known values, you specify the known_y’s argument and, optionally, the known_x’s argument. Here’s the formula for the weekly units example, which is entered as an array:

{=GROWTH(B2:B19, A2:A19)}

To forecast values using GROWTH(), add the new_x’s argument. For example, to forecast the weekly sales for weeks 19 and 20, assuming that these x-values are in A20:A21, you use the following array formula:

{=GROWTH(B2:B19, A2:A19, A20:A21)}

Figure 13-23 shows the GROWTH() formulas at work. The numbers in C2:C19 are the existing trend values, and the numbers in C20 and C21 are the forecast values.

The figure shows an Excel worksheet that uses the GROWTH() function to extend an exponential trend.
FIGURE 13-23 This worksheet shows the weekly unit sales with existing trend and forecast values calculated by the GROWTH() function.

What if you want to calculate the constants b and m? You can do that by using the exponential equivalent of LINEST(), which is LOGEST():

LOGEST(known_y's[, known_x's][, const][, stats])

known_y’s

A range reference or an array of the known y-values from which you want to calculate the trend.

known_x’s

A range reference or an array of the x-values associated with the known y-values. If you omit this argument, the known_x’s are assumed to be the array {1,2,3,...,n}, where n is the number of known_y’s.

const

A logical value that determines the value of the b constant in the exponential regression equation. If you use FALSE, b is set to 1; if you use TRUE (this is the default), Excel calculates b based on the known_y’s.

stats

A logical value that determines whether LOGEST() returns additional regression statistics besides b and m. The default is FALSE. If you use TRUE, LOGEST() returns the extra stats, which are (except for b and m) the same as those returned by LINEST().

Actually, LOGEST() doesn’t return the value for m directly. That’s because LOGEST() is designed for the following regression formula:

y = bm1x

However, this is equivalent to the following:

y = b * EXP(LN(m1) * x)

This is the same as our exponential regression equation, except that we have LN(m1) instead of just m. Therefore, to derive m, you need to use LN(m1) to take the natural logarithm of the m1 value returned by LOGEST().

As with LINEST(), if you set stats to FALSE, LOGEST() returns a 1×2 array, with m (actually m1) in the first cell and b in the second cell. Figure 13-24 shows a worksheet that puts LOGEST() through its paces:

  • The value of b is in cell H2. The value of m1 is in cell G2, and cell I2 uses LN() to get the value of m.

  • The values in column D are calculated using the exponential regression equation, with the values for b and m plugged in.

  • The values in column E are calculated using the LOGEST() regression equation, with the values for b and m1 plugged in.

The figure shows an Excel worksheet with an exponential trend that has been extended using the LOGEST() function.
FIGURE 13-24 This worksheet shows the weekly unit sales with data generated by the LOGEST() function.

Working with a logarithmic trend

A logarithmic trend is a trend that is the inverse of an exponential trend: The values rise (or fall) quickly in the beginning and then level off. This is a common pattern in business. For example, a new company hires many people up front, and then hiring slows over time. A new product often sells many units soon after it’s launched, and then sales level off.

This pattern is described as logarithmic because it’s typified by the shape of the curve made by the natural logarithm. Figure 13-25 shows a chart that plots the LN(x) function for various values of x.

The figure shows an Excel worksheet that uses the LN() function in column B to calculate the natural logarithm for values 1 through 20. A chart shows that the results graph as a logarithmic curve.
FIGURE 13-25 The natural logarithm produces a classic logarithmic trend pattern.
Plotting a logarithmic trendline

The easiest way to see the trend and forecast is to add a trendline—specifically, a logarithmic trendline—to the chart. Here are the steps to follow:

  1. Select the chart and, if more than one data series are plotted, click the series you want to work with.

  2. Select Design > Add Chart Element > Trendline > More Trendline Options to display the Format Trendline task pane.

  3. On the Trendline Options tab, select the Logarithmic option.

  4. In the Forecast section, use the Forward text box to enter the number of periods into the future you want the trendline extended.

  5. Select the Display Equation on Chart and Display R-Squared Value on Chart check boxes.

  6. Select Close (X). Excel inserts the trendline.

Figure 13-26 shows a worksheet that tracks the total number of employees at a new company. The chart shows the employee growth and a logarithmic trendline fitted to the data and extended eight periods into the future.

The figure shows a chart of monthly employee growth, with a logarithmic trendline added.
FIGURE 13-26 This worksheet shows a chart of total employee growth, with a logarithmic trendline added.
Calculating logarithmic trend and forecast values

The regression equation for a logarithmic trendline takes the following general form:

y = m * LN(x) + b

As usual, b and m are constants. So, knowing these values, given an independent value x, you can use this formula to compute its corresponding point on the trendline. In the trendline in Figure 13-26, these constant values are 182.85 and 157.04, respectively. So the formula for trend values becomes this:

=182.85 * LN(x) + 157.04

If x is a value between 1 and 16, you get a trend point for the existing data. To get a forecast, you use a value higher than 16. For example, using x equal to 17 gives a forecast value of 675 employees:

=182.85 * LN(17) + 157.04

Excel doesn’t have a function that enables you to calculate the values of b and m yourself. However, it’s possible to use the LINEST() function if you transform the pattern so that it becomes linear. When you have a logarithmic curve, you “straighten it out” by changing the scale of the x-axis to a logarithmic scale. Therefore, you can turn your logarithmic regression into a linear one by applying the LN() function to the known_x’s argument:

=LINEST(known_y's, LN(known_x's))

For example, the following array formula returns the values of m and b for the Total Employees data:

{=LINEST(B2:B17, LN(A2:A17))}

Figure 13-27 shows a worksheet that calculates m (cell E2) and b (cell F2) and that uses the results to derive values for the current trend and the forecasts (column C).

The figure shows an Excel worksheet with a formula in cell C18 that extends a logarithmic trend.
FIGURE 13-27 This figure shows the Total Employees worksheet, with existing trend and forecast values calculated by the logarithmic regression equation and values returned by the LINEST() function.

Working with a power trend

The exponential and logarithmic trendlines are both “extreme” in the sense that they have radically different velocities at different parts of the curve. The exponential trendline begins slowly and then takes off at an ever-increasing pace; the logarithmic trendline shoots off the mark and then levels off.

Most measurable business scenarios don’t exhibit such extreme behavior. Revenues, profits, margins, and employee head count often tend to increase steadily over time (in successful companies, anyway). If you’re analyzing a dependent variable that increases (or decreases) steadily with respect to some independent variable, but the linear trendline doesn’t give a good fit, you should try a power trendline. This is a pattern that curves steadily in one direction. To give you a flavor of a power curve, consider the graphs of the equations y = x2 and y = x–0.25 in Figure 13-28. The y = x2 curve shows a steady increase, whereas the y = x–0.25 curve shows a steady decrease.

The figure shows an Excel worksheet that in column B squares the values in column A and in column C raises the values in column A to the power of -0.25. A chart shows that the results graph as power curves.
FIGURE 13-28 Power curves are generated by raising x-values to some power.
Plotting a power trendline

If you think that your data fits the power pattern, you can quickly check by adding a power trendline to the chart. Here are the steps to follow:

  1. Select the chart and, if more than one data series are plotted, click the series you want to work with.

  2. Select Design > Add Chart Element > Trendline > More Trendline Options to display the Format Trendline task pane.

  3. On the Trendline Options tab, select the Power option.

  4. In the Forecast section, use the Forward text box to enter the number of periods into the future you want the trendline extended.

  5. Select the Display Equation On Chart and Display R-Squared Value On Chart check boxes.

  6. Select Close (X). Excel inserts the trendline.

Figure 13-29 shows a worksheet that compares the list price of a product (the independent variable) with the number of units sold (the dependent variable). As the chart shows, this relationship plots as a steadily declining curve, so a power trendline has been added. Note, too, that the trendline has been extended back to the $5.99 price point and forward to the $15.99 price point.

The figure shows a chart of list prices versus units sold with a power trendline added.
FIGURE 13-29 This figure shows a product’s list price versus unit sales, with a power trendline added.
Calculating power trend and forecast values

The regression equation for a power trendline takes the following general form:

y = mxb

As usual, b and m are constants. Given these values and an independent value x, you can use this formula to compute its corresponding point on the trendline. In the trendline in Figure 13-29, these constant values are 423544 and -1.906, respectively. Plugging these into the general equation for a power trend gives the following:

=423544 * x ^ -1.906

If x is a value between 6.99 and 14.99, you get a trend point for the existing data. To get a forecast, you use a value lower than 6.99 or higher than 14.99. For example, using x equal to 16.99 gives a forecast value of 1,915 units sold:

=423544 * 16.99 ^ -1.906

As with the logarithmic trend, Excel doesn’t have functions that enable you to directly calculate the values of b and m. However, you can “straighten” a power curve by changing the scale of both the y-axis and the x-axis to a logarithmic scale. Therefore, you can transform the power regression into a linear regression by applying the natural logarithm—the LN() function—to both the known_y’s and known_x’s arguments:

=LINEST(LN(known_y's), LN(known_x's))

Here’s how the array formula looks for the list price versus units sold data:

{=LINEST(LN(B2:B10, LN(A2:A10))}

The first cell of the array holds the value of b. Because it’s used as an exponent in the regression equation, you don’t need to “undo” the logarithmic transform. However, the second cell in the array—let’s call it m1—holds the value of m in its logarithmic form. Therefore, you need to “undo” the transform by applying the EXP() function to the result.

Figure 13-30 shows a worksheet performing these calculations. The LINEST() array is in E2:F2, and E2 holds the value of b (cell E2). To get m, cell G2 uses the formula =EXP(F2). The worksheet uses these results to derive values for the current trend and the forecasts (column C).

The figure shows an Excel worksheet with a formula in cell C11 that extends a power trend.
FIGURE 13-30 The worksheet of list price versus units sold, with existing trend and forecast values calculated by the power regression equation and values returned by the LINEST() function.

Using polynomial regression analysis

The trendlines you’ve seen so far have been unidirectional. That’s fine if the curve formed by the dependent variable values is also unidirectional, but that’s often not the case in a business environment. Sales fluctuate, profits rise and fall, and costs move up and down, thanks to varying factors such as inflation, interest rates, exchange rates, and commodity prices. For these more complex curves, the trendlines covered so far might not give either a good fit or good forecasts.

If that’s the case, you might need to turn to a polynomial trendline, which is a curve constructed out of an equation that uses multiple powers of x. For example, a second-order polynomial regression equation takes the following general form:

y = m2x2 + m1x + b

The values m2, m1, and b are constants. Similarly, a third-order polynomial regression equation takes the following form:

y = m2x3 + m2x2 + m1x + b

These equations can go as high as a sixth-order polynomial.

Plotting a polynomial trendline

Here are the steps to follow to add a polynomial trendline to a chart:

  1. Select the chart and, if more than one data series are plotted, click the series you want to work with.

  2. Select Design > Add Chart Element > Trendline > More Trendline Options to display the Format Trendline task pane.

  3. On the Trendline Options tab, select the Polynomial option.

  4. Use the Order spin box to select the order of the polynomial equation you want.

  5. In the Forecast section, use the Forward text box to enter the number of periods into the future you want the trendline extended.

  6. Select the Display Equation On Chart and Display R-Squared Value On Chart check boxes.

  7. Select Close (X). Excel inserts the trendline.

Figure 13-31 displays a simple worksheet that shows annual profits over 10 years, with accompanying charts showing two different polynomial trendlines.

The figure shows two charts of profits per year. The top chart includes a third-order polynomial trendline, and the bottom chart includes a fifth-order polynomial trendline.
FIGURE 13-31 This figure shows the annual profits with two charts showing different polynomial trendlines.

Generally, the higher the order you use, the tighter the curve will fit your existing data, but the more unpredictable will be your forecasted values. In Figure 13-31, the top chart shows a third-order polynomial trendline, and the bottom chart shows a fifth-order polynomial trendline. The fifth-order curve (R2 = 0.6236) gives a better fit than the third-order curve (R2 = 0.3048).

However, the forecasted profit for the 11th year seems more realistic in the third-order case (about 17) than in the fifth-order case (about 26).

In other words, you’ll often have to try different polynomial orders to get a fit that you are comfortable with and forecasted values that seem realistic.

Calculating polynomial trend and forecast values

You’ve seen that the regression equation for an nth-order polynomial curve takes the following general form:

y = mnxn + ... + m2x2 + m1x + b

So, as with the other regression equations, if you know the value of the constants, for any independent value x, you can use this formula to compute its corresponding point on the trendline. For example, the top trendline in Figure 13-31 is a third-order polynomial, so we need the values of m3, m2, and m1, as well as b. From the regression equation displayed on the chart, we know that these values are, respectively, -0.0634, 1.1447, -5.4359, and 22.62. Plugging these into the general equation for a third-order polynomial trend gives the following:

=-0.0634 * x ^ 3 + 1.1447 * x ^ 2 + -5.4359 * x + 22.62

If x is a value between 1 and 10, you get a trend point for the existing data. To get a forecast, you use a value higher than 10. For example, using x equal to 11 gives a forecast profit value of 16.9:

=-0.0634 * 11 ^ 3 + 1.1447 * 11 ^ 2 + -5.4359 * 11 + 22.62

However, you don’t need to put yourself through these intense calculations because the TREND() function can do it for you. The trick here is to raise each of the known_x’s values to the powers from 1 to n for an nth-order polynomial:

{=TREND(known_y's, known_x's ^ {1,2,...,n})}

For example, here’s the formula to use to get the existing trend values for a third-order polynomial using the year and profit ranges from the worksheet in Figure 13-31:

{=TREND(B2:B11, A2:A11 ^ {1,2,3})}

To get a forecast value, you raise each of the new_x’s values to the powers from 1 to n for an nth-order polynomial:

{=TREND(known_y's, known_x's ^ {1,2,...,n}, new_x's ^ {1,2,...,n})}

For the profits forecast, if A12 contains 11, the following array formula returns the predicted value:

{=TREND(B2:B11, A2:A11 ^ {1,2,3}, A12 ^ {1,2,3})}

Figure 13-32 shows a worksheet that uses this TREND() technique to compute both the trend values for years 1 through 10 and a forecast value for year 11 for all the second-order through sixth-order polynomials.

The figure shows an Excel worksheet with a formula in cell D12 that extends a polynomial trend.
FIGURE 13-32 This figure shows the profits worksheet, with existing trend and forecast values calculated by the TREND() function.

Note, too, that Figure 13-32 calculates the mn values and b for each order of polynomial. This is done using LINEST() by again raising each of the known_x’s values to the powers from 1 to n, for an nth-order polynomial:

{=LINEST(known_y's, known_x's ^ {1,2,...,n})}

The formula returns an n + 1×1 array in which the first n cells contain the constants mn through m1, and then the n+1st cell contains b. For example, the following formula returns a 3×1 array of the constant values for a third-order polynomial using the year and profit ranges:

{=LINEST(B2:B11, A2:A11 ^ {1,2,3})}

Using multiple regression analysis

Focusing on a single independent variable is a useful exercise because it can tell you a great deal about the relationship between the independent variable and the dependent variable. However, in the real world of business, the variation that you see in most phenomena is a product of multiple influences. The movement of car sales isn’t solely a function of interest rates; it’s also affected by internal factors such as price, advertising, warranties, and factory-dealer incentives, as well as external factors such as total consumer disposable income and the employment rate.

The good news is that the linear regression techniques you learned earlier in this chapter are easily adapted to multiple independent variables.

As a simple example, let’s consider a sales model in which the units sold—the dependent variable—is a function of two independent variables: advertising costs and list price. The worksheet in Figure 13-33 shows data for 10 products, each with its own advertising costs (column A) and list price (column B), as well as the corresponding unit sales (column C). The upper chart shows the relationship between units sold and list price, whereas the lower chart shows the relationship between units sold and advertising costs. As you can see, the individual trends look about right: Units sold goes down as the list price goes up; units sold goes up as the advertising costs go up.

The figure shows an Excel worksheet with the advertising costs, list prices, and units sold for ten products. The top chart shows that units sold falls as the list price rises, while the bottom chart shows that units sold increases as the advertising costs increase.
FIGURE 13-33 This worksheet shows raw data and trendlines for units sold versus advertising costs and list price.

However, the individual trends don’t tell us much about how advertising and price together affect sales. Clearly, a low advertising budget combined with a high price will result in lower sales; conversely, a high advertising budget combined with a low price should increase sales. What we really want, of course, is to attach some hard numbers to these seat-of-the-pants speculations. You can get those numbers using that linear regression workhorse, the TREND() function.

To use TREND() when you have multiple independent variables, you expand the known_x’s argument so that it includes the entire range of independent data. In Figure 13-33, for example, the independent data resides in the range A2:B11, so that’s the reference you plug into the TREND() function. Here’s the array formula for computing the existing trend values:

=TREND(C2:C11, A2:B11)

In multiple regression analysis, you’re most often interested in what-if scenarios. What if you spend $6,000 in advertising on a $5.99 product? What if you spend $1,000 on a $9.99 product?

To answer these questions, you plug the values into the new_x’s argument as an array. For example, the following array formula returns the predicted number of units that will sell if you spend $6,000 in advertising on a $5.99 product:

=TREND(C2:C11, A2:B11, {6000, 5.99)

Figure 13-34 shows a worksheet that puts the multiple regression form of TREND() to work. The values in D2:D11 are for the existing trend, and values in D12:D13 are forecasts.

The figure shows an Excel worksheet that uses the multiple regression form of the TREND() function to calculate an existing trend and forecast.
FIGURE 13-34 This worksheet shows the trend and forecast values calculated by the multiple regression form of the TREND() function.

Notice, too, that the worksheet in Figure 13-34 includes the statistics generated by the LINEST() function. The returned array is three columns wide because you’re dealing with three variables (two independent and one dependent). Of particular interest is the value for R2 (cell F4)—0.946836. It tells us that the fit between unit sales and the combination of advertising and price is an excellent one, which gives us some confidence about the validity of the predicted values.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.111.85