Chapter 8: Qualitative Events and Seasonality in Multiple Regression Models

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8 Preview

When you have completed reading this chapter you will be able to:

Understand the meaning of “seasonality” in business data.
Take seasonality into account as a pattern in your models.
Account for seasonality by using dummy variables.
Understand the use of dummy variables to represent a wide range of events.
Modify the Market Share model from Chapter 7 for a nonrecurring event.
Use dummy variables to account for seasonality in women’s clothing sales.

Introduction

Most of the variables you may want to use in developing a regression model are readily measurable with values that extend over a wide range and are interval or ratio data. All of the examples you have examined so far have involved variables of this type.

However, on occasion you may want to account for the effect of some event or attribute that has only two (or a few) possible cases. It either “is” or “is not” something. It either “has” or “does not have” some attribute. Some examples include the following: a month either is or is not June; a person either is or is not a woman; in a particular time period, there either was or was not a labor disturbance; a given quarter of the year either is or is not a second quarter; a teacher either has a doctorate or does not have one; a university either does or does not offer an MBA; and so forth.

The manner in which you can account for these noninterval and non-ratio data is by using dummy variables. The term “dummy” variable refers to the fact that this class of variable “dummies,” or takes the place of, something you wish to represent that is not easily described by a more common numerical variable. A dummy variable (or several dummy variables) can be used to measure the effects of what may be qualitative attributes. A dummy variable is assigned a value of 1 or 0, depending on whether or not the particular observation has a given attribute. To explain the use of dummy variables consider two examples.

Another Look at Miller’s Foods’ Market Share Regression Model

The first example is an extension of the multiple linear regression discussed in Chapter 7 dealing with a model of Miller’s Foods’ market share (MS). In that situation, you saw a regression model in which the dependent variable MS was a function of three independent variables: price (P), advertising (AD), and an index of competitors’ advertising (CAD). You saw the results of that regression in Table 7.2 and in Figure 7.1.

However, it so happens that in the second quarter of the third year (quarter 10), another major firm had a fire that significantly reduced its production and sales. As an analyst, you might suspect that this event could have influenced the MS of all firms for that period (including Miller’s Foods). At the very least, it probably introduced some noise (error) into the data used for the regression and may have reduced the explanatory ability of the regression model.

You can use a dummy variable to account for the influence of the fire at the competitor’s facility on Miller’s Foods’ MS and to measure its effect. To do so, you simply create a new variable—call it D—that has a value of 0 for every observation except the tenth quarter and a value of 1 for the tenth quarter. The data from Table 7.1 are reproduced in Table 8.1, but now include the addition of this dummy variable in the last column.

Table 8.1 Twelve quarters (three years) of Miller’s Foods’ MS multiple regression data with the dummy variable

Period	MS	P	AD	CAD	D
1	19	5.2	500	11	0
2	17	5.32	550	11	0
3	14	5.48	550	12	0
4	15	5.6	550	12	0
5	18	5.8	550	9	0
6	16	6.03	660	10	0
7	16	6.01	615	10	0
8	19	5.92	650	10	0
9	23	5.9	745	9	0
10	27	5.85	920	10	1
11	23	5.8	1,053	11	0
12	21	5.85	950	11	0

Using the data from Table 8.1 in a multiple regression analysis, you get the following equation for MS:

MS = 72.908 – 7.453(P) + .017(AD) – 2.245(CAD) + 4.132(D)

The complete regression results are shown in Table 8.2. You see that the coefficient of the dummy variable representing the fire is 4.132. The fact that it is positive indicates that when there was a fire at a major competitor’s facility, the market share (MS) of Miller’s Foods increased. This certainly makes sense. In addition, you can say that this event accounted for 4.132 of the 27 percent MS obtained during that second quarter of Year 3.

You should compare these regression results with those from Table 7.2. Note particularly the following:

The adjusted R-squared increased from 0.812 to 0.906.
The standard error of the regression (SEE) fell from 1.676 to 1.183.

Table 8.2 Miller’s Foods’ MS regression results including a dummy variable

Regression Statistics
Multiple R	0.970		DW = 1.83
R-square	0.940
Adjusted R-square	0.906
Standard error	1.183
Observations	12
ANOVA
	df	SS	MS	F	Sig. F
Regression	4	154.202	38.550	27.541	0.000
Residual	7	9.798	1.400
Total	11	164
	Coefficients	Std. error	t Stat	p-Value	p/2
Intercept	72.908	13.955	5.225	0.001	0.001
P	–7.453	1.939	–3.845	0.006	0.003
AD	0.017	0.002	7.045	0.000	0.000
CAD	–2.245	0.468	–4.802	0.002	0.001
D	4.132	1.374	3.008	0.020	0.010

Figure 8.1 Miller’s Foods’ MS: actual and predicted with dummy variable. The line representing the estimated MS for Miller’s Foods’ is derived from the following regression model: MS = 72.908 – 7.453(P) + .017(AD) – 2.245(CAD) + 4.132(D)

You see that the regression with the dummy variable is considerably better than the first model presented in Chapter 7. This is further illustrated by the graph in Figure 8.1, which shows the actual MS and the MS as estimated with the multiple regression model above. Comparing Figure 8.1 with Figure 7.1 provides a good visual representation of how much the model is improved by adding the dummy variable to account for the abnormal influence in the tenth quarter.

Modeling the Seasonality of Womens’ Clothing Sales

As a second example of the use of dummy variables, let’s look at how they can be used to account for seasonality. Table 8.3 contains data for womens’ clothing sales (WCS) on a monthly basis from January 2000 to December 2011. The full series of sales, from January 2000 to March 2001, is also graphed in Figure 8.2. You see in the graph that December sales are always high due to the heavy holiday sales experienced by most retailers, while January tends to have relatively low sales (when holiday giving turns to bill paying).

Eleven dummy variables can be used to identify and measure the seasonality in the firm’s sales. In Table 8.3, you see that for each February observation the dummy variable February has a value of 1, but March and April are 0 (as are the dummy variables representing all the remaining months). Note that for each January none of the 11 dummy variables has a value of 1. That is because the first month (i.e., January) is not February, March, or any other month. Having a dummy variable for every month other than January makes January the base month for the model.

The dummy variables for the other months will then determine how much, on average, those months vary from the first month base period. Any of the 12 months could be selected as the base. In this example, January is selected as the base month because it is the lowest month of the year on average, after taking the upward trend in the data into account. Thus, you would expect the February, March, and April dummy variables to all have positive coefficients in the regression model.

First let’s see what happens if you just estimate a simple regression using personal income (PI) as the only independent variable. The equation that results is:

WCS = 1,187.12 + 0.165(PI)

Table 8.3 Women’s clothing sales (in M$) and 11 seasonal dummy variables. Only the first two years are shown so that you see how the dummy variables are set up

This function is plotted in the top graph of Figure 8.2 along with the raw sales data. As you can see, the estimate goes roughly through the center of the data without coming very close to any month’s actual sales on a consistent basis.

Now if you add the 11 dummy variables to the regression model, the multiple regression model that results is:

Figure 8.2 Women’s clothing sales data. Compare these two graphs to see how helpful the dummy variables for seasonality can be

WCS = 572.16 + 0.16(PI) + 147.36(Feb) + 765.96(Mar)

+ 859.48(Apr) + 900.81(May) + 631.60(Jun) + 411.26(Jul)

+ 584.74(Aug) + 579.07(Sep) + 701.99(Oct) + 901.60(Nov)

+ 2,094.95(Dec)

Table 8.4 Regression results for WCS with seasonal dummy variables

Regression Statistics
Multiple R	0.963		DW = 0.52
R-square	0.927
Adjusted R-square	0.919
Standard error	163.410
Observations	135
ANOVA
	df	SS	MS	F	Sig F
Regression	12	41,101,414	3,425,118	128.269	0.000
Residual	122	3,257,726.8	26,702.68
Total	134	44,359,141
	Coefficients	Std. error	t Stat	p-Value	p/2
Intercept	572.163	113.464	5.043	0.000	0.000
PI	0.157	0.010	16.061	0.000	0.000
Feb	147.356	66.712	2.209	0.029	0.015
Mar	765.957	66.715	11.481	0.000	0.000
Apr	859.476	68.218	12.599	0.000	0.000
May	900.811	68.212	13.206	0.000	0.000
Jun	631.599	68.211	9.259	0.000	0.000
Jul	411.262	68.211	6.029	0.000	0.000
Aug	584.738	68.212	8.572	0.000	0.000
Sep	579.067	68.213	8.489	0.000	0.000
Oct	701.994	68.216	10.291	0.000	0.000
Nov	901.599	68.219	13.216	0.000	0.000
Dec	2,094.946	68.232	30.703	0.000	0.000

Now look at the lower graph in Figure 8.2 which shows estimated sales based on a model that includes PI and the 11 monthly dummy variables superimposed on the actual sales pattern. In many months, the estimated series performs so well that the two cannot be distinguished from one another. The complete regression results for this model are found in Table 8.4.

The interpretation of the regression coefficients for the seasonal dummy variables is straightforward. Remember that as this model is constructed, the first month (January) is the base period. Thus, the coefficients for the dummy variables can be interpreted as follows:

For the February dummy the regression coefficient 147.356 indicates that on average February sales are $147.356 million above January (base period) sales.
For the March dummy the regression coefficient 765.957 indicates that on average March sales are $765.957 million above January (base period) sales.
For the April dummy the regression coefficient 859.476 indicates that on average April sales are $859.476 million above January (base period) sales.

The remaining eight dummy variables would be interpreted in the same manner.

The regression coefficient for income (0.157) means that WCS are increasing at an average of $0.157 million for every billion dollar increase in PI. Note that WCS are denominated in millions of dollars while income is denominated in billions of dollars. Knowing the units of measurement for each variable is important.

Let’s summarize some of the results of these two regression analyses of WCS:

	Without seasonal dummy variables	With seasonal dummy variables
R2 and Adjusted R2	0.167	0.919
F-Statistic	27.8	128.3
Standard error of regression (SEE)	525.16	163.41

The model that includes the dummy variables to account for seasonality is clearly the superior model based on these diagnostic measures. The better fit of the model can also be seen visually in Figure 8.2. However, the DW statistic (0.52) in the larger model does indicate the presence of positive serial correlation.

As you have seen, the development of a model such as this to account for seasonality is not difficult. Dummy variables are not only an easy way to account for seasonality; they can also be used effectively to account for many other effects that cannot be handled by ordinary continuous numeric variables.

The model for WCS can be further improved by adding two additional variables: (1) the University of Michigan Index of Consumer Sentiment (UMICS), which measures the attitude of consumers regarding the economy; and (2) the women’s unemployment rate (WUR). Both the new variables are statistically significant and have the expected signs. It should be expected that as consumer confidence climbs, sales would also increase; the positive sign on the UMICS coefficient displays this characteristic. Likewise, as women’s unemployment rises it could be expected that sales of women’s clothing would decrease; the negative sign on the WUR is consistent with this belief. These results are reported in Table 8.5; note that the adjusted R-square has increased to 0.966. This model still has positive serial correlation so you would want to be cautious about the t-tests. However, since all the t-ratios are again quite large this may not be a problem. Also, because serial correlation does not bias the coefficients these can be correctly interpreted.

Table 8.5 Regression results for WCS with the addition of the UMICS and WUR and seasonal dummy variables

Regression Statistics
Multiple R	0.985		DW = 1.29
R-square	0.970
Adjusted R-square	0.966
Standard error	105.403
Observations	135
ANOVA
	df	SS	MS	F	Sig F
Regression	14	43,025,964	3,073,283	276.628	0.000
Residual	120	1,333,176	11,110
Total	134	44,359,141
	Coefficients	Std. error	t Stat	P-value	p/2
Intercept	–493.609	208.160	–2.371	0.019	0.010
PI	0.240	0.010	24.092	0.000	0.000
WUR	–69.366	8.044	–8.624	0.000	0.000
UMICS	6.581	1.199	5.488	0.000	0.000
Feb	170.311	43.187	3.944	0.000	0.000
Mar	795.541	43.260	18.390	0.000	0.000
Apr	879.704	44.303	19.857	0.000	0.000
May	916.233	44.175	20.741	0.000	0.000
Jun	643.657	44.091	14.599	0.000	0.000
Jul	427.283	44.096	9.690	0.000	0.000
Aug	612.048	44.249	13.832	0.000	0.000
Sep	609.528	44.346	13.745	0.000	0.000
Oct	746.065	44.594	16.730	0.000	0.000
Nov	938.182	44.384	21.138	0.000	0.000
Dec	2,116.818	44.094	48.007	0.000	0.000

In the appendix to this chapter, you will see another complete example of how dummy variables can be used to account for seasonality in data. This example expands on the analysis of occupancy for the Stoke’s Lodge model.

What You Have Learned in Chapter 8

You understand the meaning of “seasonality” in business data.
You can take seasonality into account as a pattern in your models.
You know how to account for seasonality by using dummy variables.
You understand the use of dummy variables to represent a wide range of events.
You are able to modify the MS model from Chapter 7 for a nonrecurring event.
You can use dummy variables to account for seasonality in WCS.

Appendix

Stoke’s Lodge Final Regression Results

Introduction

Recall that the Stoke’s Lodge Monthly Room Occupancy (MRO) series represents the number of rooms occupied per month in a large independent motel. As you can see in Figure 8A.1, there is a downward trend in occupancy and what appears to be seasonality. The owners wanted to evaluate the causes for the decline. During the time period being considered there was a considerable expansion in the number of casinos in the State, most of which had integrated lodging facilities. Also, gas prices (GP) were increasing. You have seen that multiple regression can be used to evaluate the degree and significance of these factors.

The Data

The dependent variable for this multiple regression analysis is MRO. These values are shown in Figure 8A.1 for the period from January 2002 to May 2010. You see sharp peaks that probably represent seasonality in occupancy. For a hotel in this location, December would be expected to be a slow month.

Figure 8A.1 Stoke’s Lodge room occupancy. This time-series plot of occupancy for Stoke’s Lodge appears to show considerable seasonality and a downward trend

Management had concerns about the effect of rising gas prices (GP) on the willingness of people to drive to this location (there is no good public, train, or air transportation available). They also had concerns about the influence that an increasing number of full service casino operations might have on their business because many of the casinos had integrated lodging facilities. Rather than use the actual number of casinos, the number of casino employees (CEs) is used to measure this influence because the casinos vary a great deal in size.

In Chapter 4, you saw the following model of monthly room occupancy (MRO) as a function of gas price (GP):

MRO = 9322.976 – 1080.448(GP)

You evaluated this model based on a four-step process and found it to be a pretty good model. The model was logical and statistically significant at a 95 percent confidence level. However, the coefficient of determination was only 14.8 percent (R2 = 0.148) and the model had positive serial correlation (DW = 0.691).

In Chapter 6, you saw that the original model could be improved by adding the number of casino employees (CE) in the state. This yielded the following model:

MRO = 15,484.483 – 1,097.686(GP) – 1,013.646(CE)

This time you evaluated the model based on a five-step process because you had two independent variables rather than one, so needed to consider multicollinearity. Again the model seemed pretty good. The model was logical and statistically significant at a 95% confidence level. The coefficient of determination increased to 20.1% (Adjusted R2 = 0.201). This model also had positive serial correlation (DW = 0.730). You will see that these results can be improved by using seasonal dummy variables.

A graph of the results from the model with just two independent variables is shown in Figure 8A.2. The dotted line representing the predictions follows the general downward movement of the actual occupancy. But what do you think is missing? Why are the residuals (errors) so large for most months? Do you think it could be because the model does not address the issue of seasonality in the data? If your answer is “yes” you are right. And, as with WCS, dummy variables can be used to deal with the seasonality issue.

Figure 8A.2 Actual MRO and MRO predicted (MROP). The predicted values are shown by the dotted line, based on MRO as a function of GP and CE. The coefficient of determination for this model is 20.1 percent

The Hypotheses

When you estimate a regression you know that you have certain hypotheses in mind. In this case you know from Chapters 4 and 6 that you have the following hypotheses related to GP and CE.

H0 : β ≥ 0

H1 : β < 0

That is, you have a research hypothesis that for both GP and CE the relationship with MRO is inverse. When either GP or CE increase you would expect MRO to decrease, so the alternative (research) hypothesis is that those slopes would be negative.

Based on conversations with Stoke’s Lodge management, you can expect that December is their slowest month of the year on average. You can set up dummy variables for all months except December to test to see if this is true. Thus, for the dummy variables representing January– November the hypotheses would be:

H0 : β ≤ 0

H1 : β > 0

This is due to the fact that you expect all dummy variables to be positive (by construction).

The first 15 months of data for this situation are shown in Table 8A.1. The regression results are shown in Table 8A.2.

The Results

From the regression output in Table 8A.2 you see some interesting results. First, the model follows the logic you would expect with negative coefficients for CP and CE, as well as positive coefficients for all 11 seasonal dummy variables. You also see from the F-test the as a whole the regression is very significant because the significance level (p-value) for F is 0.000. This means that you can be well over 95 percent confident that there is a significant relationship between MRO and the 13 independent variables in the model.

The t-ratios are all large enough that p/2 is less than the desired level of significance of 0.05 except for January and November. Recall that sometimes it is reasonable for you to drop your confidence level to 90 percent (a significance level of 10 percent, or 0.10). The coefficients for January and November satisfy this less stringent requirement. Look at the “Adjusted R-square” in Table 8A.2. It is 0.830, which tells you that this model explains about 83 percent of the variation in MRO for Stoke’s Lodge. This is much better than the 20.1 percent you saw when seasonality was not considered.

In Table 8A.2, you also see that the DW statistic of 1.883 is now much closer to 2.0, the ideal value. Based on the abbreviated DW table in an appendix to Chapter 4, you would be restricted to using the row for 40 observations and the column for 10 independent variables. Doing so, you would conclude that the test is indeterminate. If you look online you can find more extensive DW tables.1 For n = 100 and k = 13 you would find dl = 1.393 and du = 1.974 in which case the result is still indeterminate.

To evaluate the possibility of multicollinearity, you need to look at the correlation matrix shown in Table 8A.3. There you see that all of the correlations are quite small so there is no multicollinearity problem with this model. You may wonder why all the correlations for pairs of monthly dummy variables are not the same. This is because in the data you do not have an equal number of observations for all months. There are five more observations for January through May than for the other seven months.

Table 8A.1 The first 15 months of data. Notice the pattern of the values for the seasonal dummy variables. For each month in the “Date” column there is a one in the corresponding column representing that month

Date	MRO	GP	CE	Jan	Feb	Mar	Apr	May	June	July	Aug	Sept	Oct	Nov
Jan-02	6,575	1.35	5.3	1	0	0	0	0	0	0	0	0	0	0
Feb-02	7,614	1.46	5.2	0	1	0	0	0	0	0	0	0	0	0
Mar-02	7,565	1.55	5.2	0	0	1	0	0	0	0	0	0	0	0
Apr-02	7,940	1.45	5.2	0	0	0	1	0	0	0	0	0	0	0
May-02	7,713	1.52	5.1	0	0	0	0	1	0	0	0	0	0	0
Jun-02	9,110	1.81	5.0	0	0	0	0	0	1	0	0	0	0	0
Jul-02	10,408	1.57	5.0	0	0	0	0	0	0	1	0	0	0	0
Aug-02	9,862	1.45	5.0	0	0	0	0	0	0	0	1	0	0	0
Sep-02	9,718	1.60	5.1	0	0	0	0	0	0	0	0	1	0	0
Oct-02	8,354	1.57	5.8	0	0	0	0	0	0	0	0	0	1	0
Nov-02	6,442	1.56	6.6	0	0	0	0	0	0	0	0	0	0	1
Dec-02	6,379	1.46	6.4	0	0	0	0	0	0	0	0	0	0	0
Jan-03	5,585	1.51	6.4	1	0	0	0	0	0	0	0	0	0	0
Feb-03	6,032	1.49	6.4	0	1	0	0	0	0	0	0	0	0	0
Mar-03	8,739	1.43	6.3	0	0	1	0	0	0	0	0	0	0	0

Table 8A.2 Regression results for the full Stoke’s Lodge model

Regression Statistics
Multiple R	0.923		DW = 1.883
R-square	0.852
Adjusted R-Square	0.830
Standard error	743.124
Observations	101
ANOVA	df	SS	MS	F	Sig F
Regression	13	277,032,357	21,310,181	38.589	0.000
Residual	87	48,044,350	552,233.9
Total	100	325,076,708
	Coefficients	Standard error	t Stat	p-Value	p/2
Intercept	10,270.912	1,065.996	9.635	0.000	0.000
GP	–1,396.275	117.891	–11.844	0.000	0.000
CE	–453.858	162.408	–2.795	0.006	0.003
Jan	547.215	361.715	1.513	0.134	0.067
Feb	1,303.684	361.737	3.604	0.001	0.000
Mar	2,705.610	362.592	7.462	0.000	0.000
Apr	2,140.244	363.438	5.889	0.000	0.000
May	3,671.132	365.245	10.051	0.000	0.000
June	3,650.990	375.260	9.729	0.000	0.000
July	4,305.826	374.846	11.487	0.000	0.000
Aug	4,500.630	375.350	11.990	0.000	0.000
Sept	3,466.594	374.798	9.249	0.000	0.000
Oct	2,550.378	372.192	6.852	0.000	0.000
Nov	547.899	371.616	1.474	0.144	0.072

Table 8A.3 The correlation matrix for all independent variables. All the correlation Coefficients are Quite Small so multicollinearity is not a problem

Jan

Feb

Mar

Apr

May

June

July

Aug

Sept

Oct

Nov

–0.02

Jan

–0.08

0.02

Feb

–0.05

0.02

–0.10

Mar

0.01

0.00

–0.10

Apr

0.06

0.01

–0.10

May

0.13

0.00

–0.10

June

0.02

–0.08

–0.09

July

0.01

–0.07

–0.09

Aug

0.01

–0.08

–0.09

Sept

0.03

–0.06

–0.09

Oct

–0.02

0.04

–0.09

Nov

–0.05

0.10

–0.09

The regression equation can be written based on the values in the “Coefficients” column in the regression results shown in Table 8A.2. The equation is:

MRO = 10,270.912 – 1,396.275(GP) – 453.858(CE) + 547.215(Jan) + 1,303.684(Feb) + 2705.610(Mar) + 2,140.244(Apr) + 3,671.132(May) + 3,650.990(June) + 4,305.826(July) + 4,500.630(Aug) + 3,466.594(Sept) + 2,550.378(Oct) + 547.899(Nov)

Figure 8A.3 shows you how well the predictions from this equation fit the actual occupancy data. You see visually that the fit is quite good. There are some summer peaks that appear to be underestimated. However, overall the fit is good, and certainly better than the fit shown in Figure 8A.2 for which seasonal dummy variables were not included in the model.

Figure 8A.3 Actual and predicted values for MRO. The predicted values are shown by the dotted line, based on MRO as a function of GP, CE, and 11 seasonal (monthly) dummy variables

1 See: www.nd.edu/~wevans1/econ30331/Durbin_Watson_tables.pdf

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8: Qualitative Events and Seasonality in Multiple Regression Models

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 8: Qualitative Events and Seasonality in Multiple Regression Models