In the previous chapter, we discussed the main steps in building a multiitem reflective measurement scale. However, this is not sufficient to reflect all the decisions to be made in this process because these steps specify the general procedure to be followed, but do not say anything about the formulation of items that may be better than others or about the response formats associated with the scale that may be more efficient in the processes of data collection and analysis, etc. The quality of an instrument is largely determined by the number of items, the labels (words) used, the formulation of the items, the number of response modalities, the numbering associated with them, the existence or not of response labels and/or numbers, the type of scale (Likert, semantic differential, etc.), etc.
Reflections on the scale to be built require more than just a knowledge of the stages of progression through which the researcher must pass and the statistical knowledge to be applied. However, researchers often do not pay enough attention to the design of a scale, wrongly believing that purification and validation processes are able to identify all the limitations and shortcomings to be corrected. While statistics (reliability tests and factor analyses) make it possible to assess the psychometric qualities of a scale, they do not replace the researcher in judging the ability of a measurement design to get closer to the realities experienced by respondents, all while avoiding certain biases. This reflection must be carefully carried out from the beginning of the scale construction process, because once the fieldwork has begun, it is difficult to go back. While revisions are possible in order to make adjustments, they are likely to result in a loss of time and in additional effort.
Decisions relating to the design of a measuring scale do not exclusively concern the development of new scales, but also the adaptation of an existing scale which can effectively cover all aspects (response formats, labels, deletion and/or addition of items, etc.). In addition, many of the considerations we will discuss in this chapter (e.g. the formulation of indicators, the number of response modalities, etc.) do not relate exclusively to reflective measures, but also to formative specifications. In this chapter, we will review some of the components of a scale, particularly in terms of item formulation, numbers, response formats (number and labels) and the presentation of some of the attributes of a scale.
Often, the researcher is in doubt about the number of items to be included in a scale. These items must also be assessed on the basis of a number of response modalities to cover respondents’ perceptions or opinions. These two quantitative attributes (number of items and number of response choices) of a scale involve a set of devices to determine an optimal number.
The idea that a single-item scale is sufficient to understand the less abstract constructs is sometimes accepted. However, this aspect is debatable and the use of scales with multiple indicators is the most frequent, despite the disadvantages they generate (cumbersome administration, high cost, etc.). Without going back to the discussions in Chapter 3 (section 3.3) on the choice of a single-item or multi-item measure, efforts should be made to adjust the length of the scale so that the advantages of the designed scale can be maximized. It should be remembered that seeking to arbitrate the number of indicators to be retained is a specific concern for reflective scales. In the following, we will first outline a few proposals for the number of items to be retained and in a second instance, we will review some criteria to facilitate the choice of this number.
Having summarized the contributions and limitations of single-item versus long multi-item scales, Malhotra et al. (2012) proposed a sequence to follow to search for the optimal number of items between these two possibilities, without specifying one in particular. On this point, literature is not very abundant. Akrout (2010, p. 112) points out that it is recommended to have three items to properly represent a latent variable. Diamantopoulos et al. (2012) propose the number of four items to make comparisons and test a measurement model. Bagozzi (2011) also retains this number of four items for a relevant and interesting estimate of a reflective measurement model, except that he adds that this is applicable for one-factor (unidimensional) models.
According to Bagozzi (2011), for models with two or more factors, at least two indicators per dimension are needed to avoid ambiguity. But he assumes that it would be better to retain three or more indicators. Mowen and Voss (2008, p. 499) “recommended that researchers set a goal of developing scales that have between four and eight items; further, if a scale has dimensions, each dimension should have from three to five items”. Hinkin et al. (1997) recommend generating, at the very beginning of the scale development process, twice as many items as necessary, that is four to six items for each construct, suggesting that only half of them will be retained in the next steps.
It is clear that none of the various proposals mentioned constitute a rule to be adopted in absolute terms. When choosing the number of items to be retained, the researcher must take into account several considerations:
The researcher’s judgment should preferably focus on all these determinants in order to optimize the number of items that will be definitively selected. Empirical data and psychometric properties sometimes force them to accept a non-optimal solution according to some of the methodological considerations mentioned above. It should be remembered, however, that it is possible to move back and forth between the different methodological decisions in order to harmonize the whole.
The debate here relates to the question of the number of points (response modalities) to be retained. Some support the choice of a large number, others a small number. The controversy is essentially engaged in the following dilemma: discrimination of responses versus response bias. In this section, the discussion will focus first on the results of research dealing with the consequences of different formats (number of points) and their determinants, then on proposals for the number of response modalities, and finally on some recommendations and criteria that may facilitate the choice of an optimal number of response modalities.
We will review some research findings. Some support the merits of a large number of response modalities, others point out the disadvantages, while others present contrasting results:
This clearly suggests that increasing the number of response modalities may not, in some situations, generate a greater wealth of information and, at the same time, may not systematically create a high burden during data collection and processing. How is it then possible to understand the contrast in these different results? In fact, discussions about the optimal number of response choices are made from two main perspectives: the researcher and the respondent. The observation of the debates under way clearly underlines that there does not seem to be agreement on an optimal number of modalities. It appears from the results of several investigations that this number can only be decided under certain conditions:
Starting from different angles of analysis, some proposals surrounding the number of measurement echelons can be put forward. Some start from a few criteria, including those mentioned above, others discuss the issue of whether or not to integrate a neutral position:
Faced with these contrasting results, should a central point be included in the scale and therefore have an odd number of response modalities, or would it be desirable to force the respondent to provide a response using a symmetrical scale with an even number of responses, in order to prevent them from resorting to the neutral category to hide their opinion? The answer to this question is dictated in part by the studied population and the researcher’s objectives. When some respondents may not have opinions to express on the subject, an odd number is preferred. However, if the researcher has reason to believe (but must clearly justify) that some respondents could potentially hide their true opinions (for example for a more “taboo” subject) and choose the neutral mode to avoid disclosing them, then an even number is to be favored. Finally, it should be noted that the purpose of the measure may favor an even or odd number of choices: for example, measuring the roles played by family members (parent or child) in a purchasing decision easily lends itself to a central position that conveys the idea of joint roles between members as opposed to separate roles taken at the initiative of one or the other.
Regardless of the perspective used (that of the researcher or respondent), to choose the number of modalities, the benefits of a specific number are often thwarted by inherent limitations. This section will attempt to shed some light on this point by examining two main aspects: (1) the number of response modalities in a questionnaire, and (2) the criteria for choosing the number of response modalities in a scale.
Whatever the recommended decision, it is desirable to harmonize the number of points on the scales within the same questionnaire when this decision does not alter the understanding of the items and the content of a scale. Indeed, it is not uncommon to observe an exploitation of different response formats (in terms of the number of points) in the same survey and among the same respondents, particularly when establishing nomological validity. This is likely to generate many difficulties:
The choice of a number of response points must be justified. To this end, efforts must be made to take into account the specificities of the research undertaken. Indeed, the number of modalities can only be decided after examining other aspects of the research design:
Far from the considerations surrounding the number of items and the choice of answers to be used in a scale, it must be noted that there are many qualifiers (words, labels, formulations) that can be considered. The researcher must decide on the most appropriate content to facilitate the understanding and the adherence of the respondents in order to report their assessments of the phenomenon as accurately as possible. In this section, we will examine the verbal components of a scale. By verbal components, we refer to two main aspects: the formulation of items and the labels (if they exist) associated with the response modalities. These two components actively participate in shaping the design of a scale and largely determine the operational quality of a measure. In the first section, we will attempt to examine different item formulation formats and some of their consequences in order to identify a set of practical recommendations. In the second section, we will focus on the labels of the response categories proposed to respondents in order to express their attitudes about the items.
A critical look at the various results of the discussions undertaken on this subject is of great importance because, first, it makes it possible to highlight observations about the formulations of items to be retained, and second to formulate some recommendations so as to arbitrate the choice of formulations.
Apart from the clear and precise considerations needed when choosing indicators for a construct, the formulation of the items has intrigued many researchers who are often confronted with the problem of whether or not to integrate reversed items. A major decision is required: should negative (reversed) and positive items be included in a scale? The answer to this question has been divisive and agreement on the need (or not) for such use seems to be lacking. Each format has potential interests but also many limitations. As such, the use of both formats, on the same scale, seems useful in order to allow better discrimination of items, while avoiding some response bias. By response bias, it is generally recognized that some responses do not reflect the opinions of respondents, disrupting the validity of a measure. This concerns in particular the tendency of an individual to systematically respond in the same way to items on a scale.
This trend gives rise to quite varied styles, such as the tendency to agree, the tendency to disagree, the tendency to use extreme responses, the tendency to use midpoint responses, etc., which may have several causes such as cultural specificities (Bartikowski et al. 2006; Roster et al. 2006; Dolnicar and Grün 2007; De Jong et al. 2008; Peterson et al. 2014) and the sociodemographic characteristics of the respondent (Greenleaf 1992; Peterson et al. 2014). These styles can affect the validity of research findings (Baumgartner and Steenkamp 2001). It is therefore crucial to pay close attention to this type of bias when adopting, adapting and building a measurement scale during the item formulation phase.
In this section, we will report on four main aspects associated with mixed items: (1) the types of item formulation, (2) the problems created by the introduction of reversed items, (3) the causes generating the problems of using mixed items, (4) the solutions recommended to solve the problems of using mixed items.
It is obvious to observe that a variety of item formulations are possible. Weijters and Baumgartner (2012) specify that an item can be formulated either as an affirmation or as a denial of something. Swain et al. (2008) note that since scales are often used to measure the existence of a phenomenon (behavior, attitude, etc.) rather than its absence, most regular items are formulated as affirmations while most reversed items are formulated as negations. Chang (1995) adds that an item can be negative, either by adding “no, etc.” or by a grammatical positive form (this building is poorly designed), however, a grammatically negative sentence (example: I am not sad) is not necessarily a negative item – some words and items have a negative connotation, while others have a positive connotation. Sonderen et al. (2013) observed that in general, in order to reverse an item, it is possible to consider two strategies: the first is to add a negation (example: no), the second is to use opposite words (example: large versus small). Swain et al. (2008) find that to reverse an item, it is possible to reverse its linguistic polarity, which refers to the fact that the item acquires either a positive polarity (an affirmation) or a negative polarity (a negation). Weijters and Baumgartner (2012) add that several types of negations are possible (verb negation, adjective negation, etc.) and reversed items can be formulated in different ways. As an illustration, they note that a reverse item can be defined as an item whose meaning is opposite to the standard of understanding chosen (construct, item, respondent). Faced with the different potential definitions of a negative (or even reversed) item, Chang (1995) notes that a positive or negative label implies a value judgment that is difficult to generalize across time and cultures. He proposes instead the terminology of a coherent or incoherent item with the majority of items on a scale.
More generally, the objective of using mixed items is to alert inattentive respondents to the variations that exist between indicators, in the hope of circumventing some of the above biases. However, this way of proceeding (scale with mixed items) can lead to:
Thus, some believe that the format of positive items (regular non-reversed) is superior to other formats, as differently formulated items are unlikely to result in consistent information (Schriesheim and Eisenbach 1995; Weems et al. 2003). Reversed (negative) items that are supposed to act as a cognitive slower in order to better control responses damage measurement. According to Sonderen et al. (2013), using mixed items is counterproductive because it increases confusion, cognitive fatigue and ultimately leads to more response bias without any significant benefit.
Attempts were made to understand the causes of problems when using mixed items (positive and reversed). As an indication, it would seem that:
Is the validity of a scale improved:
The reflection on the use (or not) of reversed items was extended to the question of the number of items to be retained of each type (negative and positive) in a scale. For single-item Likert measurements, Alexandrov (2010) notes the superiority of positive formulation. To avoid the problems associated with mixing the two formats in multi-item scales, some recommend an equal number of positive (regular) and reversed (negative) items. Understanding biases seem to be better controlled when the scale is balanced; a different number is problematic (Roszkowski and Soven 2010). However, Chen et al. (2007), although they used a balanced scale in terms of the number of items in each category (4 positive and 4 negative items), still find two factors following a principal components analysis, each of which included either positive or negative items. Dickson and Albaum (1975) find that the responses for three formats of a semantic differential scale with different polarities2 (positive, negative and balanced) are not significantly different.
Other reasonings, exploring the importance of mixed items, have revealed that the problem of the integration (or not) of reversed items is probably located elsewhere and more particularly in the multiplicity of possible formulations of the items. Several types of negation and inversion are possible where it is easy to understand that an item can be reversed without containing a negation and vice versa. Each type of item can have different consequences on the answers obtained. Weijters and Baumgartner (2012) suggest that negatives should be used sparingly. According to them, there are few benefits to using negatives that do not lead to a reversed item. Roszkowski and Soven (2010) share this view, as they believe that negative items cause confusion in the minds of respondents, so they are only interesting if respondents are not careful. According to Chang (1995), items with different connotations (coherent/incoherent) do not necessarily induce the same construct. He suggests only using items with connotations consistent with the rest of the items on a scale and with the construct object of a measurement, even if the objective is to circumvent response biases.
On the other hand, for Weijters and Baumgartner (2012), the use of reversed items (not negative) has the particular advantage of encouraging better coverage of the domain of the construct. Although reversed (oppositely worded) items probably produce artificial factors, their elimination is not always necessary (Spector et al. 1997) and may mask inconsistent responses; they should therefore be used with caution (Weijters and Baumgartner 2012). Hartley (2013) points out, on the basis of a body of work, three particular findings: first, it is difficult to write exactly equivalent terms in a positive and negative form; second, the respondent has difficulty reversing their thinking (understanding) in order to provide an assessment of the items; third, different ratings are given for the positive and negative versions of the items. For Hartley (2013), two options are then possible: remove negative items from a scale or present the results separately for such items.
From the controversy over the need (or not) to use a mix of items, it is clear that some accredit such a use, due to its ability to absorb certain biases. Others assume that it is only a decoy and that other biases can be promoted by such a mix.
We believe that, in any case, reversed items will probably always exist according to the standard of understanding chosen, whether this is to clarify certain phenomena (for example, positive and negative emotions) or whether it is in the minds of some respondents who have points of view opposed to the idea proposed by the item. In addition, the items (regular and reversed) may take different formulations for the same idea:
Negative formulations seem to be the least successful and quite often deserve to be discarded. Nevertheless, each format can have many consequences (positive and negative). These can be grouped according to several logics (statistical, conceptual, etc.). The concern, therefore, in our view, is not whether or not to use a particular type of item, but rather to focus on the merits that a type of formulation could bring to a scale. The objective of the final choice is to promote responses that reflect both respondents’ perceptions and the latent constructs to be measured. In other words, while fitting as closely as possible to the field of the operationalized construct, the measure must provide as accurately as possible the respondents’ position on the dimensions of the construct while minimizing any forms of bias.
Whatever the formulation of the items in a measurement scale, it is imperative to carry out a set of preliminary examinations:
These various recommendations should not be seen as a succession of fixed steps, but as a set of warnings. Moreover, they depend largely on the context of the study, the construct to be operationalized, and can be mobilized in different ways. It is sometimes possible, during the same test, to collect information in order to verify both the understanding of item formulations and to statistically test the structure of a scale or even test other variables of the scale design (number of items, response modalities, etc.). In addition, in intercultural investigations, additional checks on the equivalence of words, labels, facets of behavior, etc., are crucial. These recommendations are to be used as a guide for adjusting the formulations of the items to be gathered.
The words used in the response modalities may be equally important to consider. Each label may indeed be perceived or understood differently by respondents, particularly in different language contexts, favoring certain categories of responses rather than others. Generally, items are evaluated on the basis of response labels composed of positive and negative assessments, which are supposed to be opposed. As such, Rozin et al. (2010) note that some positive words do not have opposing negative words in some cultures and languages. Bartikowski et al. (2006) point out that a word translated into another language may be lexically equivalent, but may have a different meaning. Seeking equivalence in the formulation of items without regarding response modalities can have serious consequences on data quality. It should be noted that the considerations relating to the labels to be used are not exclusively related to intercultural research projects. Labels may also give rise to different interpretations by respondents within the same cultural context.
Although some researchers, such as Cox (1980) and Wildt and Mazis (1978), have long suggested that the labels chosen have effects on responses, this subject has remained relatively neglected in favor of other aspects of scale design. However, in order to assess the idea underlying an item, the respondent relies heavily on the proposed response options. Lam and Stevens (1994) noted in this regard that the content of items should not be examined apart from response labels. They observed that different formats (items-labels) can lead to different results. If labels, being associated with all items on a scale, are ambiguous, inappropriate, etc., then they can expose serious problems affecting the representation of the construct under study.
Among the few studies undertaken on this subject, Revilla (2015, p. 235) found that the use of labels, mentioning “higher frequencies or durations, increases the proportion of respondents reporting higher frequencies or higher time spent on the corresponding activities”. In addition, Weijters et al. (2013) analyzed, in different multi-lingual cultural contexts, the effects of two aspects associated with a label: intensity (extreme responses) and familiarity (use in the language). They found that the choice of labels has consequences on responses: on the one hand, the higher of a label’s intensity is perceived by respondents, the more the response is perceived as extreme, and on the other hand, the more familiar the labels used are to respondents, the more frequently they are associated with use. This leaves little doubt that clear labels should be used, assigning the same meaning to all respondents.
It should also be noted that it is interesting to question the label of the central point when the scale has an odd number of response modalities. For a Likert-type format, for example, should the neutral, don't know, indifferent or neither agree nor disagree label be used? Do each of these formulations lead to different perceptions? As such, although the neutral and neither agree nor disagree options seem to refer to the same level of agreement, don’t know or indifferent probably result in a lack of opinion. Although some researchers have not observed any significant differences between some central labels, such as Armstrong (1987) who compared two labels: neutral and undecided, it is crucial to consider whether different labels can be perceived and interpreted differently.
The verbal content of the items, their number, etc. are not the only supports on which respondents base their answers. The visual language and the layout of the different response options can be decisive. Despite a properly formulated number of items, sufficient and understandable response modalities, poorly adjusted configuration and marking of the different choices available to respondents can hinder responses. First, we will examine some of the non-verbal attributes involved in the design of a scale, which can at least help to obtain reliable and valid answers to the phenomena to be measured. Second, we will point out the interest of their exams.
The items composing a scale can be evaluated using a multitude of formats: Likert, semantic differential, etc., as we pointed out in the first chapter. These formats include, among others, labels and/or numbers. That said, labels such as strongly agree, strongly disagree, ratings such as “1”, “5”, “-2”, “+2”, can be considered. Which position (right or left) to choose, then, for each of the labels and numbers selected? Should labels be associated with all the possible answers proposed or exclusively presented at the extreme points of a scale? Similarly, should numbering accompany all or some of the labels? It is easy to observe that these kinds of questions do not receive much attention. Nevertheless, research has shown that they are factors that can modify respondents’ appreciation of the indicators (items). In the following, we will discuss some of these attributes, in particular considerations of order, association and visual presentation.
For a long time, the importance of the position of response labels in the expression of individuals’ choices on a scale has been noted (e.g. Belson 1966; Wildt and Mazis 1978). Hartley and Betts (2010) and Betts and Hartley (2012) found that different arrangements, relating to the position of positive labels on a scale (Likert) and the associated numerical values, produce different effects. More clearly, they observed that with positive labels and high scores to the left of a scale, the scores obtained are significantly higher than the scores resulting from other combinations. Betts and Hartley (2012) identify several possible explanations for this, noting more specifically a visual attention bias to the left as the usual left-right effect in reading native English speakers. Chan (1991), using a Likert scale to present (left or right) positive (describes me very well) and negative (does not describe me well) response labels, noticed different means and estimates of the latent trait depending on the formats tested. More specifically, he found that with the positive format on the left, a better estimate of the trait was obtained.
But can this order effect vary across cultures? For example, in Arabic-speaking countries where reading is done from right to left, does a positive format on the right become necessary? For other countries, such as Japan or China, where reading can take place in other directions, for example, from top to bottom, can the previous horizontal presentation format (left or right) be relevant to the evaluation of an item? In addition, it is often possible to observe, when administering a scale, a presentation of items in two or more languages (e.g. Arabic, French, English) in order to reach a larger sample of the population. Should the items, response labels (positive or negative) and their associated numbers (high or low) be adjusted on the right or left? Although it is probably easier to choose the position of these variants when it comes to the use of languages with the same reading direction, the choice seems more complicated when it comes to several languages written in opposite directions.
Similarly, the position of the numerical values assigned to the different response modalities may be assessed differently from one group of respondents to the next. This evaluation can also be significantly affected by the construct to be measured. Indeed, some phenomena (abstraction of the items that are supposed to understand them and their formulation) have an intrinsic negative connotation, while others have an intrinsic positive connotation. Respondents can expect to find numberings (high or low) and response labels (positive or negative) to the left or right, depending on their perception of what best corresponds to the expression of their attitudes.
Apart from considerations of order, the numbers associated with the response modalities may also have different meanings depending on the respondents. Schwarz (1999) points out that a respondent’s interpretation of a scale may indeed vary depending on the meaning given to the association between a response label and a number. Schwarz et al. (1991), having retained labeled scales only at the ends, note that responses vary according to the numerical values assigned to the extreme points of a scale. Having compared the results obtained using a semantic differential scale in two formats: one with exclusively labels and the other with exclusively numberings, Dolch (1980) found that the two formats give rise to different factor structures, although the correlations between the items in the two formats are high.
Differences in perception can also be recorded in a format where only the extremities are labeled and another where all response modalities are labeled, where the presence of labels only at the extreme points probably forces the respondent to undertake a more elaborate cognitive processing in order to imagine the meaning of the other points (Lantz 2013). Weng (2004) notes a trend towards more reliability (test-retest procedure) when all points are labeled compared to a label just at the ends of the scale. Similarly, Eutsler and Lang (2015) found that a scale where all modalities are labeled, compared to a scale labelled just at the ends, minimizes response bias, maximizes variance and lessens measurement error. Moors et al. (2014) recorded variations in extreme response styles by format, with respondents in particular using extreme responses more frequently with a format where response modalities are labeled exclusively at the ends (versus labeled at all points). It appears, a priori, that a clearly formulated label of the response modalities leads to more stability in the understanding of the items to be evaluated. In addition, the presence of labels and a numbering for each of the modalities seems to offer a valuable guide to respondents in their quest to understand and evaluate the items.
Moors et al. (2014) point out that bipolar numbering (example: from “-3 to +3”), instead of positive numbering (example: from “1 to 7”), can be difficult to use for scales evaluating agreement, thus negative values can be confusing. Positive values (example: 1; 5) can mean the absence or presence of a phenomenon, a mix of positive and negative values (example: -2; +2) can mean opposite poles (Schwarz et al. 1991). But this is, of course, not always the case. Indeed, positive values (1; 5) can sometimes mean the understanding of aspects that are opposites.
For example, to measure the influence of the husband and the wife in the couple’s purchasing decision, the number “1” can be associated with the dominant role of the wife, while the number “5” can be associated with the dominant role of the husband (or vice versa), thus, two positive numbers that reflect two opposite directions in the understanding of the distribution of roles within the couple.
In addition, the nature of the phenomenon to be measured may facilitate or promote the use of one format at the expense of another. Respondents’ expression in relation to some items related to the construct of happiness, for example, is probably facilitated by an assessment grid (labels and numberings) highlighting positive labels and positive numbers (example: label “strongly agree”; note “5”), rather than the association of a positive label (“strongly agree”) with a negative numbering (-2). Indeed, the latter case (positive label, negative numbering) could express, consciously or unconsciously for the respondent, the lack of happiness. The respondent could, perhaps for reasons of social desirability, have difficulty expressing themselves due to the confusion that could be generated by this association.
It should be noted that even the chosen response format (example: Likert or semantic differential) can constrain or facilitate the choice of numbering. By way of illustration, for the semantic differential scale, it is a question of approaching the phenomenon with reference to opposite poles. It then seems easier to use mixed values (positive and negative) that can more explicitly express this bipolarity. Assigning a score of “+2” to the performing adjective and a score of “-2” to the non-performing adjective does not necessarily influence the respondent’s understanding.
Other aspects can be decisive in the architecture of a scale and the arrangement of its different components. For example: images accompanying verbal (and/or numeral) descriptions of response modalities, symbols (boxes, hyphens, etc.) marking response choices, presentation of the scale on the page (example: drawn in portrait or landscape mode, one or more items per page, a vertical or horizontal presentation of response modalities), etc. These elements significantly contribute to the visual appeal of the scale. Although studies on these types of attributes are relatively few in number, Tourangeau et al. (2004) observed that respondents’ interpretation of items is influenced by visual presentation attributes such as the spacing between response options, the order of response options and the grouping of items. They found that these attributes affect respondent choice and response time.
Visual attributes seem more relevant during self-administration, especially via the web. In the latter case, Van Schaik and Ling (2007), having tested different scale formats, observed that each can have certain advantages and disadvantages according to a multitude of criteria (e.g. respondents’ preferences, spontaneity of responses, response time), even if the psychometric qualities resulting from the use of different formats can be comparable. Deutskens et al. (2004) observed that a long questionnaire with images (versus exclusively a text) reduces the response rate. This is probably due to, according to them, the time it takes to download the photos, which leads respondents to abandon the questionnaire. However, they noted that with the visual mode, respondents use the modality of answer don’t know less; once the subject of the question is visualized, respondents are probably better able to report their opinions. Are photos, animations, colors, etc. assets to guide the respondent or will they disturb them in their understanding of a content and even lead them to abandon the questionnaire?
It is clear that different aspects of the design of a scale can lead to different assessments by respondents. We can even argue that the inconsistent conclusions between different research projects are due, among other things, to the design of the measurement scales and more particularly, to some of their components (labels, numbering, presentation, etc.). Certainly, other considerations should be taken into account to test the robustness of these conclusions, such as the respondent’s tendency to express themselves in a certain way (positive or negative), their response style (example: tendency to favor extreme responses), the mode of administration (face-to-face, telephone, etc.). It is clear that conceptually formulated hypothetical links between different phenomena assessed from different scales in terms of formats (types of labels, visual aspects, etc.) can affect the quality of results.
Without dwelling on this point, it should be noted that a questionnaire composed of a set of scales of different constructs with harmonized formats (type of scale, numbering, label-numbering association, position of labels, etc.) would seem to be able to reduce certain biases and improve the quality of the results. Admittedly, this is not always possible, particularly when the objective of the study is actually to test different formats of a scale or when the different constructs to be grasped, particularly during the phase of establishing nomological validity, are not suitable for the same measurement formats. In any case, it is useful to reflect on these aspects in order to optimize all the choices associated with the measurement tool to be used.
The test of a scale should not be limited exclusively to understanding the items, checking their content or, even better, their psychometric properties, as is often the case. Indeed, different variants (position, numbering, etc.) affect respondents’ reactions to the items. Some formats may not be appropriate for some cultures or respondent groups. Care must therefore be taken to choose formats that reduce understanding and interpretation bias and provide reliable and valid results. To this end, an exploratory study can be used to observe the most appropriate order of presentation for respondents, that is, the one that best covers their feelings, opinions and attitudes. Individual or group interviews also allow interviewees to react to the subject under study and see how they verbalize their attitudes. Finally, it should be noted that, even if it is difficult to control all the components of a scale, knowledge of the consequences that they can generate is very important when reading the results obtained when using the scale.
It is rudimentary to rely exclusively on the steps for constructing a scale mentioned in the previous chapter without paying attention to the specific design of the items, their formulation, the number of points to retain, etc. In this chapter, we have focused on the main components of a scale. Without going back over all the aspects discussed, we would like to insist on the fact that the development of a scale does not involve a succession of fixed steps consisting of defining, purifying and validating a number of indicators covering a construct. The production of a reliable and valid measure requires much deeper reflections focused on all the attributes of a scale and no aspect should be neglected or considered as minor. Indeed, all the components of a scale contribute to the development of a tool that promotes an understanding of reality as it is experienced or felt by the individuals about whom the researcher is interested.
Although the implications of some of the aspects studied have not yet been fully explored in literature, it is still essential when constructing scales to be aware of the importance of quantitative (number of items, number of response modalities) and qualitative (verbal and visual) components as determinants that can affect responses and thus the quality of the data obtained in terms of reliability and validity. When choosing the design of a scale, it is important to take the time to check whether certain attributes can generate biases to neutralize them. Of course, the responses that highlight real differences between individuals in terms of behavior and evaluations specific to the construct under study would not be biases, but those resulting from a measurement protocol that is not adequate for capturing reality around a construct are often biased. Indeed, the measuring instrument can lead to inaccurate, confusing and incorrect answers.
18.216.143.65