Introduction: The Basics of Scale Reliability

You can compute coefficient alpha when you have administered a multiple-item rating scale to a group of participants and want to determine the internal consistency of responses to the scale. The items constituting the scale can be scored dichotomously (scored as “right” or “wrong”) or the items can have a multiple-point rating format (e.g., participants can respond to each item using a 7-point “agree-disagree” rating scale).

This chapter shows how to use the SAS PROC CORR procedure to compute the coefficient alpha for the types of scales that are often used in social science research. However, this chapter does not show how to actually develop a multiple-item scale for use in research. To learn about recommended approaches for creating summated rating scales, see Spector (1992).

Example of a Summated Rating Scale

A summated rating scale usually consists of a short list of statements, questions, or other items to which participants respond. Very often, the items that constitute the scale are statements, and participants indicate the extent to which they agree or disagree with each statement by selecting some response on a rating scale (e.g., a 7-point rating scale in which 1 = “Strongly Disagree” and 7 = “Strongly Agree”). The scale is called a summated scale because the researcher typically sums responses to all selected responses to create an overall score on the scale. These scales are often referred to as Likert-type scales.

Imagine that you are interested in measuring job satisfaction in a sample of employees. To do this, you might develop a 10-item scale that includes items such as “in general, I am satisfied with my job.” Employees respond to these items using a 7-point response format in which 1 = “Strongly Disagree” and 7 = “Strongly Agree.”

You administer this scale to 200 employees and compute a job satisfaction score for each by summing his or her responses to the 10 items. Scores can range from a low of 10 (if the employee circled “Strongly Disagree” for each item) to a high of 70 (if the employee circled “Strongly Agree” for each item). Given the way these scores were created, higher scores indicate higher levels of job satisfaction. With the job satisfaction scale now developed and administered to a sample, you hope to use it as a predictor or criterion variable in research. However, the people who later read about your research are going to have questions about the psychometric properties of responses to your scale. At the very least, they will want to see empirical evidence that responses to the scale are reliable. This chapter discusses the meaning of scale reliability and shows how SAS can be used to obtain an index of internal consistency for summated rating scales.

True Scores and Measurement Error

Most observed variables measured in the social sciences (e.g., scores on your job satisfaction scale) actually consist of two components: a true score that indicates where the participant actually stands on the variable of interest along with measurement error. Almost all observed variables in the social sciences contain at least some measurement error, even variables that seem to be objectively measured.

Imagine that you assess the observed variable “age” in a group of participants by asking them to indicate their age in years. To a large extent, this observed variable (what the participants wrote down) is influenced by the true score component. To a large extent, what they write will be influenced by how old they actually are. Unfortunately, however, this observed variable will also be influenced by measurement error. Some will write down the wrong age because they don’t know how old they are, some will write the wrong age because they don’t want the researcher to know how old they are, and other participants will write the wrong age because they didn’t understand the question. In short, it is likely that there will not be a perfect correlation between the observed variable (what the participants write down) and their true scores on the underlying construct (i.e., their actual age).

This can occur even though the “age” variable is relatively objective and straightforward. If a question such as this is going to be influenced by measurement error, imagine how much more error results when more subjective constructs are measured (e.g., items that constitute your job satisfaction scale).

Underlying Constructs versus Observed Variables

In applied research, it is useful to draw a distinction between underlying constructs versus observed variables. An underlying construct is the hypothetical variable that you want to measure. In the job satisfaction study, for example, you wanted to measure the underlying construct of job satisfaction within a group of employees. The observed variable, on the other hand, consists of the responses that you actually obtained. In that example, the observed variable consisted of scores on the 10-item measure of job satisfaction. These scores may or may not be a good measure of the underlying construct.

Reliability Defined

With this understanding, it is now possible to provide some definitions. A reliability coefficient can be defined as the percent of variance in an observed variable that is accounted for by true scores on the underlying construct. For example, imagine that in the study just described, you were able to obtain two scores for the 200 employees in the sample: their observed scores on the job satisfaction questionnaire; and their true scores on the underlying construct of job satisfaction. Assume that you compute the correlation between these two variables. The square of this correlation coefficient represents the reliability of responses to your job satisfaction scale; it is the percent of variance in observed job satisfaction scores that is accounted for by true scores on the underlying construct of job satisfaction.

The preceding was a technical definition for reliability but this definition is of little use in practice because it is generally not possible to obtain true scores for a variable. For this reason, reliability is usually defined in terms of the consistency of the scores that are obtained on the observed variable. An instrument is said to be reliable if it is shown to provide consistent scores upon repeated administration, upon administration by alternate forms, and so forth. A variety of methods of estimating scale reliability are used in practice.

Test-Retest Reliability

Assume that you administer your measure of job satisfaction to a group of 200 employees at two points in time: once in January and again in March. If responses to the instrument were indeed reliable, you would expect that the participants who provided high scores in January will tend to provide high scores again in March, and that those who provided low scores in January will also provide low scores in March. These results would support the test-retest reliability of responses to the scale. Test-retest reliability is assessed by administering the same instrument to the same sample of participants at two points in time, and then computing the correlation between the sets of scores.

But what is an appropriate interval over which questionnaires should be administered? Unfortunately, there is no hard-and-fast rule here; the interval depends on what is being measured. For enduring constructs such as personality variables, test-retest reliability has been assessed over several decades. For other constructs such as depressive symptomatology, the interval tends to be much shorter (e.g., weeks) due to the fluctuating course of depression and its symptoms. Generally speaking, the test-retest interval should not be too short so that respondents recall their responses to specific items (e.g., less than a week) but not so long as to measure natural variability in the construct (e.g., bona fide change in depressive symptoms). The former leads to an overstatement of test-retest reliability whereas the latter leads to understatement of test-retest reliability.

Internal Consistency

A further problem with the test-retest reliability procedure is the time that it requires. What if you do not have time to perform two administrations of the scale? In such situations, you are likely to turn to reliability indices that may be obtained with only one administration. In research that involves the use of questionnaire data, the most popular of these are the internal consistency indices of reliability. Briefly, internal consistency is the extent to which the individual items that constitute a test correlate with one another or with the test total. In the social sciences, the most widely used index of internal consistency is the coefficient alpha symbolized by the Greek letter α (Cronbach, 1951)[1].

[1] Usage of the Greek letter alpha (α) to represent an index of internal consistency should not be confused with the alpha used to specify significance levels for other statistical analyses described in this text.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.255.249