Having analyzed confidence
intervals and p-values, we would (only) choose those variables passing
the minimum threshold of accuracy and significance for further and
final analysis. Next, we would analyze the actual slopes themselves,
notably for size.
The three analytical
issues when analyzing regression slopes are direction,
statistical size, and practical
implications. It is for this analysis that you
did the regression in the first place.
First, it is obviously
crucial whether an independent variable is associated in a positive
or negative way to the dependent variable. If the slope is positive
in sign, then when the independent variable increases by a unit, the
dependent variable also is expected to be higher. But, a negative
slope suggests that when the independent variable increases by a unit,
the dependent variable decreases.
Emphasizing the prior
section, do not analyze direction unless the initial analysis of significance
and accuracy has been passed. Too many beginners look at a non-significant
variable with a small, negative slope and wrongly infer that there
is a negative relationship between the independent variable and the
dependent variable. The negative slope could just be an artifact;
the lack of significance means you cannot really tell if it is positive,
negative, or zero.
Having ascertained direction
of the significant variables, the key thing is size of that effect.
To what extent does the predictor associate with the dependent variable,
and which predictors are comparatively more or less associated to
the dependent variable? We ascertain this through both unstandardized
and standardized slopes.
Unstandardized
Slopes
We start by analyzing
unstandardized slopes (often referred to as ‘
B’s),
as seen in the “Parameter Estimate” column of the regression
coefficients table (e.g. see
Figure 13.27 The regression parameters table (slopes and intercept) in SAS ).
These assess the raw impact of the independent variable on the dependent
variable, specifically the change in the dependent variable (in its
units) if the independent variable increases by one of its units.
For example, in the
example above, Enquiries is related to Sales via an unstandardized
slope of 1395, inferring that for every extra Enquiry the customer
makes per month on average, the first-year Sales tends to increase
about $1,395. Is this large enough to take seriously? The researcher
would consider this in light of the nature of these two variables
as well as the aim of the regression. Is this a large increase in
sales given average sales? How much spread is there in enquiries?
Can we affect this variable?
There can be two issues
with analyzing raw regression slopes.
-
If you have more than
one predictor variable, they cannot be compared to each other if the
independent variables have different scales. This is because, if you
go back to the fundamental definition of these slope coefficients,
they are the expected change in the dependent variable
when the independent variable increases by 1 unit.
If the predictors have different units, then a 1-unit increase in
one is not the same thing as a 1-unit increase in the other. To compare
them would be comparing apples and oranges. For instance, each of
the variables Enquiries, Trust and Satisfaction has its own slope.
Trust is measured on a 1-100 scale, whereas Enquiries is a count.
An increase of one unit in satisfaction on that scale is not comparable
to one extra enquiry, on average, per month – they are completely
different things with different scales! Similarly, neither matches
to Satisfaction on a 1-7 scale.
-
Sometimes unstandardized
slopes are difficult to analyze because you do not necessarily understand
the scales of measurement. For instance, although we can understand
the 1-7 scale of Satisfaction on one level, it is perhaps hard to
be sure of the significance of a 1-unit increase.
For these reason, we
also use standardized slopes to help us grasp the sizes of slopes,
as discussed next.
Standardized
Slopes
Why would we need standardized
slopes and what do they mean?
Standardized regression
slopes are a regression done on our variables after they have all
been scaled to have equal averages of 0 and equal standard deviations
of 1. This means that we can compare the slopes: a 1-unit increase
in one standardized predictor is the same as a 1-unit increase in
another predictor.
But there is more. In
a standardized regression, a 1-unit increase in a variable specifically
equates to a 1 standard deviation (SD) increase.
If the independent variable is age, we would be talking about a change
in the age of the salesperson from the average age to 1 SD higher
than the average. Now, recall from the Chapter 7 discussion on standard
deviations that 1 SD on either side of the mean picks up about 65%
of the data. Therefore, we are talking about a change in the independent
variable from the average to one SD above average.
Therefore a standardized
regression slope tells us by how many standard
deviations the dependent variable is expected to change if there is
an increase of 1 standard deviation in the independent variable (again
holding all other independent variables constant). Standardized regression
coefficients are read like correlations and have roughly the same
meaning: they run between -1 and +1, with scores closer to -1 or 1
indicating a stronger slope. For example, a standardized slope of
0.65 can be read sort-of like a correlation of .65.
The great thing about
these standardized slopes is that, because all the variables now have
the same-sized change, the slopes can be directly compared. A standardized
slope of .87 is absolutely steeper than one of .75, a standardized
slope of -.34 is also steeper than one of .12 (although one is negative
and the other positive, the effect of the first is bigger).
I suggest calculating
and assessing both unstandardized and standardized slopes when doing
linear regression. The unstandardized slopes are always best for direct
interpretation where this is possible: always try to translate these
into direct meanings such as “expected sales increases in dollars
if....” Standardized slopes are great for comparisons of effect
of predictors (bigger standardized slopes means a more powerful independent
variable impact) and for situations where the meaning of the unstandardized
slopes is difficult to interpret.
Let us look again at
Figure 13.27 The regression parameters table (slopes and intercept) in SAS .
As seen there, trust has the largest standardized slope of .41 (i.e.
when a customer’s trust increases by one standard deviation,
sales increases by .41 standard deviations). Note that trust is not
the biggest unstandardized slope: raw slopes are not comparable to
each other – it is the standardized slope that can be compared.
Therefore because .26 (which is the standardized slope of enquiries)
is about 2/3 the size of the trust slope .41, enquiries has about
2/3 the impact on sales than does trust.
The next section discusses
the issue of how to interpret unstandardized slopes for dummy variables.
Interpreting
the Slopes of Dummy Variables
Remember that, in specifying
categorical and ordinal predictors, we took the step of translating
these into dummy variables (columns of zeros and ones), with a missing
reference category.
In the case example
we have one categorical and one ordinal variable, both of which we
converted to dummy variables with one missing (reference) category.
The variable “Premium” is categorical – it takes
the values 1 = “Premium” with “Freeware”
being the reference. The variable “Size” is ordinal
– we converted it to two dummies, namely, “Small”
and “Medium” with “Big” left as the reference.
The question with such
variables is how to interpret their slopes. Here, we interpret any
given dummy variable slope as being the level of the independent variable
(predictor) category on the dependent variable compared to the missing
(reference category).
For instance, the one
ordinal variable in the case example is Small, which is a dummy variable
in which a “1” indicates that the customer is categorized
as small in size. According to
Figure 13.27 The regression parameters table (slopes and intercept) in SAS ,
the unstandardized (“Regression Coefficient”) slope
for this variable is approximately -7471. Because this is a dummy
variable, the slope is a comparison of the dependent variable (sales)
level for small customers compared to the reference category (in this
case referring to big customers). In other words, we would read this
slope as “small customers bought on average $7,471 less in
first-year services than big customers.” Interestingly, if
you look at medium-sized customers, you will see they bought more
sales on average than big customers. Note that these averages are
controlled for other variables, so they will differ from literal average
comparisons as we saw earlier in the book.
Despite the possibilities
of this process, when the exact measurement units of either the dependent
or independent variables is unknown, then unstandardized variables
are always going to be rather hard to work with. One major difficulty
is that, when you initially look at them, unstandardized slopes cannot
be compared to each other because each independent variable has a
different effective unit of measurement. This is where standardized
slopes come in, as discussed in the next section.