The Three Essential Analytical Tools
Having discussed the basic data properties of mean, variance, and covariance, we not turn to discussing the three covariance-based techniques referenced earlier—correlation analysis, regression analysis, and structural equations modeling (SEM). We will discuss these in the same order, that is, correlation followed by regression and then followed by SEM. The reason for doing so is that correlation is probably the simplest to deploy and understand, and SEM is the most complex. However, simple does not mean inadequate or inferior to a more complex technique. Sometimes based on the objectives of the research, correlation might provide all the necessary information that the project requires. In such a case, it sometimes makes little sense to invest time and capital deploying a more complex technique such as SEM. Overall, our key objective in explaining these techniques is to provide readers with enough information on each so that they can be more astute buyers of research. They would be in a better position to ask hard questions regarding which of these techniques will be most applicable for their specific purpose.
Correlation Analysis
Correlation analysis is a covariance-based technique that is bivariate in nature. What this means is that when running correlation analysis, the focus is on only two variables at a time. Let us say a survey that asks hotel guests to provide their ratings of overall satisfaction with the stay, as well as their evaluations of various areas of experience such as room cleanliness, technology, room service, and fitness facilities. Correlation analysis can now be used to identify which of these experience areas have a greater impact on guest satisfaction. In such analyses, one would start by correlating guest ratings of overall satisfaction with performance ratings provided by guests about room cleanliness. The results would provide information on the three key parameters—the direction, magnitude and statistical significance of the correlation between the two variables. One could then run a similar correlation analysis between guest ratings of overall satisfaction and say their ratings of technology. The same analysis could then be repeated for other pairs of variables. At the end of such pair-wise analysis, each of these correlation outputs could be looked at simultaneously, to assess which of the areas of experience has a greater impact on overall guest satisfaction.
As would be obvious by now however, these analyses are bivariate because each correlation analyzes only two variables at a time—overall guest satisfaction and one individual area of experience. In running the correlation between overall guest satisfaction and say, room cleanliness, the analysis assumes that there is nothing else in the world that matters or changes. In other words, it assumes that other variables, such as room service, fitness facilities, and so on have no effect on the relationship between the two variables under study.
The correlation coefficient, the outcome of correlation analysis, is a number that can range between minus one and plus one. A minus or plus indicates the first of the three considerations—the direction of the relationship. A positive value of the correlation coefficient suggests that the two measures covary in the same direction—that is, if guests report favorable perceptions of room cleanliness, they are also likely to report favorable overall satisfaction with the stay. A negative correlation coefficient value on the other hand suggests that the measures move in the opposite direction—that is, more favorable perceptions of room cleanliness result in less favorable overall guest satisfaction. From the perspective of magnitude, the second consideration, a correlation coefficient is constrained between minus and plus one. A magnitude of one indicates that there is a perfect association between two measures. In the previous table, for example, overall guest satisfaction would be estimated to have a plus one correlation with guest revenue, wherein every point improvement in overall guest satisfaction will lead to an incremental hundred dollars of revenue. On the other hand, a correlation coefficient of minus one will exhibit a similar perfect relationship but the variables will move in opposite directions. Less-than-perfect relationships lower the magnitude of the correlation coefficient. An absence of any relationship at all results in a correlation of zero.
Estimating the Relationship Between Two Variables | |
Guest satisfaction (1–10) scale | Annual revenue ($) provided by guest |
1 | 100 |
2 | 200 |
3 | 300 |
4 | 400 |
5 | 500 |
6 | 600 |
7 | 700 |
8 | 800 |
9 | 900 |
10 | 1000 |
Regression Analysis
Regression analysis is very similar to correlation, but has two key differences. One, regression allows a researcher to work with more than two measures at the same time, and can therefore expand the scope of analysis from a bivariate world to a multivariate world. If we continue with the hotel example discussed previously, regression analysis will not require the each potential driver of overall hotel guest satisfaction to be analyzed one at a time. Instead, it can allow the research team to simultaneously identify the relative importance of multiple areas of experience, such as room cleanliness, technology, room service, and fitness facilities, in one single and common model. From a consumer behavior perspective then, regression allows the research team to analyze a situation where the respondent can be seen as evaluating all these areas of experience simultaneously in providing an overall guest satisfaction rating. Such an underlying assumption, some could argue, is more consistent with the way things work in the real world. The following table shows how the concept of correlation analysis can be expanded to include multiple drivers of revenue provided by each guest.
The estimated parameter would confirm that every point improvement in guest satisfaction still leads to an incremental $100 of revenue. However, what the analysis would also provide as new information is that for a given level of satisfaction with the hotel stay, the total annual revenue provided is also contingent on the number of total annual business trips taken by the guest. In this case, every incremental business trip taken by the guest gives the hotel chain an opportunity to make an extra $50 from the guest.
Guest satisfaction (1–10) scale | Number of business trips taken annually | Annual revenue ($) provided by guest |
1 | 5 | 350 |
2 | 5 | 450 |
3 | 5 | 550 |
4 | 5 | 650 |
5 | 10 | 1000 |
6 | 10 | 1100 |
7 | 10 | 1200 |
8 | 10 | 1300 |
9 | 20 | 1900 |
10 | 20 | 2000 |
This leads to the second of the two key differences between regression and correlation. Unlike correlation analysis, the estimated parameter value of regression analysis is not constrained say between minus one and plus one. Consequently, the research team can estimate the real financial impact of improving guest satisfaction, as well as that of targeting more frequent business travelers. This allows management to hypothesize and evaluate alternate scenarios. They can evaluate the impact of shifting focus to frequent travelers as well as the potential benefit of improving their current guest satisfaction rating by one scale point and so on.
Structural Equations Modeling
Structural equations modeling (SEM) is a relatively more recent analytical introduction to the tool kit of marketers. It is similar to regression analysis, in that it is a multivariate technique, and therefore has the ability to examine covariation among more than two measures as part of one overall model. There are however three important differences between regression analysis and SEM. One, SEM allows an analyst to work with latent constructs. An easy way to explain a “latent construct” is that it is a concept or phenomena that we cannot directly observe and are therefore unable to directly measure. Customer satisfaction is a good example of a latent construct. Theoretically speaking, satisfaction measures a customers’ overall evaluation of their consumption experiences. However, one cannot directly observe something like “customer satisfaction.” Therefore, we design a set of proxy measures to tap the concept. This often requires multiple measures that all tap into the common concept that we like to label as “customer satisfaction.” In a survey, for example, we might ask customers to provide their responses on, say, a 1 to 10 scale on multiple measures of satisfaction such as “overall satisfaction,” “expectations being exceeded,” and “proximity to the ideal brand.”
The second important difference between SEM and regression analysis is that the former acknowledges and incorporate measurement error into the models. In the interest of keeping the discussion nontechnical, we will skip the details, encouraging the interested readers to pick suitable texts on the topic. We would however make the point that such error is widespread in social sciences, and can often dilute the estimated relationships. Accounting for measurement error, as is provisioned for in SEM, allows the research team to boost the strength of the relationship among the variables of interest. Last but certainly not the least, SEM allows the research team to draw a structure of relationships among various measures, which is something regression analysis cannot perform. Imagine that the team believes that, in the market of interest, favorable pricing perceptions lead to more positive value perceptions, which in turn lead to greater customer satisfaction. SEM allows us to design and estimate such a series of relationships through one model. Regression analysis on the other hand will estimate the impact of price perceptions and value on customer satisfaction, without recognizing the price perceptions may themselves be driving value.
52.15.224.97