Appendix: Frequently Asked Questions
This appendix contains a collection of specific, practical questions that we are commonly asked. There are some big and important issues here, such as how to check whether a competency framework is suitable for assessment and whether it is possible to measure integrity. For the most part, you do not need to have read the rest of the book to understand the answers. But we suggest you look at the definition of validity in chapter 2 if you have not read it yet. The questions we address are:
If you would like to see more further frequently asked questions and answers or pose a question of your own, you can do so at our online blog (www.measuringtalent.com).
For the most part, we have tried to avoid using technical language in this book. This is particularly true when it comes to the issue of validity. To keep things simple, we have mainly discussed validity as if it is just one thing: the ability to predict future workplace success. Yet this is only one very specific gauge of validity, and there are others that it might be useful to know about.
We hinted at this in chapter 2 when we described two basic quality checks that vendors should make when developing measures: whether the measure is accurate in its measurements and whether it can help predict things like certain behaviors, events, or the chance that someone will succeed at something.
These checks roughly equate to two big concepts in the world of validity. Accuracy is broadly equivalent to what psychologists call construct validity: the degree to which a measure genuinely assesses what it says it does. So with an intelligence test, does it really measure intelligence? Validity is important because many of the elements that we try to look at in talent measurement are quite subjective. Ask three people for a broad definition of adaptability, and you will get roughly the same answer. But ask them to specify exactly what adaptability involves and what the best questions to evaluate it are, and you are likely to obtain three entirely different answers.
To help ensure that methods and tools do indeed measure what they are supposed to, vendors often look at three specific things:
The ability of tests to predict certain things, meanwhile, is what psychologists call criterion validity. The thing we usually try to predict is performance, but it can be pretty much anything—from turnover and absenteeism to productivity and promotion. There are three types of criterion validity:
We have focused on predictive criterion validity in this book because it is what businesses tend to be most interested in. But it is worth being aware of the other types of validity, and we have two specific recommendations here:
Finally, there are a few other validity phrases that you will sometimes hear mentioned:
To check whether a framework is suitable for assessing people, we suggest six simple criteria.
International assignments are a common developmental tool and are seen as an opportunity for growth. But they can also be challenging. There is much debate about how high failure rates really are, but almost everyone agrees that careful selection is necessary. So what should you look for?
To begin, there are some specialist tools on the market. For the most part, they measure intercultural competence: the ability to engage with, understand, and operate within other cultures. This certainly sounds useful, and the developers of these tests invariably describe them as valid. Yet you do need to be careful here. What they usually mean by “valid” is that the tests are accurate: they genuinely measure intercultural competence. However, solid evidence that these tools are able to predict success in assignments is largely lacking.
What are the options, then? Prior international experience appears to help, but its ability to predict success is very low. Flexibility and adaptability appear to be more predictive of success, although the research on them is limited.3 The role of the Big Five personality factors has been more often studied (see chapter 2). For the most part, the ability of these factors to predict assignee success has been fairly low. For example, the validities found for conscientiousness are around 0.17, and for emotional stability around 0.10.4 However, one aspect of personality that does appear promising in predicting success is relational skills—the ability to build relationships. A combination of extraversion and agreeableness has thus been shown to have predictive validities of around 0.32.5 The idea is that the more able people are to build relationships, the more opportunities they will have to engage with and adjust to different cultures. For similar reasons, language skills are also often touted as important. Yet with validities of only around 0.2, they appear to be less predictive than relational skills.
A number of studies have now shown that even more important than the assignee's personal qualities can be the family factor. This is the role of the family, and in particular the assignee's partner, in making the move a success. The burden on the partner is often considerable, and when this aspect of a move fails, the whole assignment can fail too.6 Indeed, some studies have suggested that this is the single most important factor in determining success, or at least in avoiding failure.7 Many companies seem to focus only on the assignee. But given the research, we recommend involving the whole family in the selection process at the earliest possible stage.
Finally, remember that in looking at personality and the family factor, you should not overlook some of the standard factors used for predicting job success. Aspects like job knowledge and intelligence are just as important for overseas roles as they are for home country ones. To summarize, our recommendation for selecting international assignees is to focus on these:
Tests of people's integrity are nothing new, but they have become popular only in the past twenty years. The trigger for the increased interest in these tests was the introduction of regulations in the United States in 1988 that restricted the use of polygraphs. The annual cost to firms of some employee behaviors such as theft can be considerable, so there was a demand for other ways to identify people who were likely to engage in these behaviors. Enter integrity tests.
The use of these tests was at first mainly limited to measuring the likelihood of theft. Over the years, though, they have begun to be used to predict a broader range of counterproductive work behaviors (CWBs). These include not only theft but also absenteeism, drug use, unsafe behavior, and violence or bullying.
The vast majority of integrity tests available on the market today are psychometrics, and there are two main types.8 Overt measures do not disguise their purpose: they ask direct questions about the extent to which people have engaged in illegal or unacceptable behaviors. Covert measures, by contrast, do disguise their purpose and are usually based on standard personality tests. The idea behind these tests is that people with certain types of personality are more likely to engage in CWBs.
Do they work? There is evidence to suggest that they can, with validities of up to 0.4 being reported for their ability to predict some CWBs. This means that at their best, they are more able to predict CWBs than personality tests are able to predict job performance. Some researchers have questioned whether overt tests of integrity in particular can work, since they are easy to fake. But others have argued that regardless of whether some people fake their responses, these tests can still predict CWBs in many people. They have also been shown to be able to reduce incidences of behaviors such as theft in real work situations. So there is some evidence to suggest that in some scenarios, integrity tests can indeed be effective and add value. However, there are some big caveats here.
First, the ability of integrity tests to predict CWBs varies according to the test you use and the specific CWBs you try to predict. For example, some overt tests have been shown to be good predictors of theft, with validities of up to 0.36 reported. Yet overt tests tend to be fairly poor predictors of absenteeism, with validities of around 0.14.9 Similarly, the personality dimension of conscientiousness has been shown to be quite predictive of CWBs aimed at the organization, such as theft. Yet it is far less effective at predicting CWBs aimed at individuals, such as antisocial behavior.10
Second, the research that has been done to date on integrity tests mainly relates to moderate- to low-level jobs. There is a notable lack of evidence that these tests can be successfully used with more senior roles and more complex types of CWBs. For instance, we recently advised a financial trading business against using an integrity test as part of its selection processes for hiring traders. There is simply not enough evidence that they work in these situations.
Third, there can be some practical issues with integrity tests, such as whether and how to exclude candidates purely on the basis of their integrity test results. The problem here is that all integrity tests will inevitably produce a number of false positives. These are people who do poorly on the test but do not subsequently go on to demonstrate any CWBs. In fact, they can go on to be model corporate citizens. For this reason, integrity tests are far better used as an indicator for further investigation (for example, in an interview) than as a pass-or-fail type of test.
Finally, one aspect to be wary of when deciding whether to use integrity tests is reporting bias. Validity studies conducted by vendors selling integrity tests tend to report much higher validities than independent research into the efficacy of these tests.11 So what does all this mean for companies considering using integrity tests? We have three recommendations:
One final thing to bear in mind is that the causes of CWBs do not lie only within individuals. Contextual factors have also been shown to be important and, sometimes, to be even better predictors of CWBs than traditional integrity tests. For example, the greater the level of fit there is between a role and an individual's career goals, the less chance there is he or she will engage in CWBs.14 And low levels of job satisfaction have been shown to be more related to CWBs than many tests of integrity.15 More research is required, but as with trying to predict job performance, the answer is unlikely to lie only within individuals.
The answer depends on what you are trying to achieve. If you are trying to assess who takes illegal drugs (or at least fails to hide the fact that they do), then yes it can be. If you are trying to predict absence, then again, yes it can be. But if you are trying to predict accident proneness, then it is probably not useful. And if you are trying to predict performance, it is almost definitely not.16
There are two main ways of thinking about the different types of intelligent tests. The first way, which you may already be familiar with, is according to the types of questions asked. The most common forms of questions are numerical, verbal (which test someone's ability to work with language), logical, diagrammatic, and mechanical. Some tests use only one type of question; others use two or three different types. No one type is best.
As for which to use, our recommendation would be to choose the two or three that are most relevant to the roles you are assessing. We say at least two because although in general people's scores on one type of question predict their scores on the other types, this is not always so. Sometimes people do excellently on, say, a verbal test but poorly on a mathematical one. One additional thing to bear in mind here is that the amount of adverse impact created by intelligence tests sometimes differs for each type of question. This is particularly so for people of different genders and ages. Men, for example, tend to do better on numerical and diagrammatic questions, whereas women tend to do better on verbal ones. And our ability to do verbal tests seems to decline more slowly with age than our ability to do other types of tests.17
The second way to think about what type of test you need is to consider whether it is a speeded test or a power test. One of the best examples of a speeded test is the Wonderlic. This is a well-known US test that in its classic form requires people to answer fifty-two verbal and numerical questions in 12 minutes. This means, on average, they need to answer one question every 13.85 seconds. That's speeded. At the other end of the scale are power tests. They typically last 20 to 40 minutes and ask only one type of question. Each question requires more thought and takes longer to answer than in a speeded test. As for which you should use, we again recommend thinking about the requirements of the role. Is quick thinking an important part of it? And again, adverse impact can be an issue, since there are reports that speeded tests can show greater levels of adverse impact.
Finally, there are some tests available that measure very specific aspects of thinking. One of the best known is the Vienna Test System, which involves over thirty separate tests. It can measure things like coordination and the ability to divide or sustain attention or to switch between tasks. They are mostly used for specialized roles, such as for pilots, but we have also seen them used with financial traders.
The obvious place to start is with validity. Unfortunately, though, there are few studies that directly compare the validity of different personality tests. One notable, albeit not independent, exception is Saville Consulting's recent Project Epsom, which compared the validity of four tests on a group of three hundred people and is freely available on the Internet.18 In addition, as we described in chapter 5, the Buros Institute's test reviews can be a valuable resource, although it reviews tools individually rather than comparing different ones.
So validity is clearly one thing you need to consider. But there are three other things as well. First, check whether a personality test measures what you need it to. You may, for example, be particularly interested in measuring assertiveness, in which case you need a test that does just this. Some tests measure the Big Five personality dimensions, but many tests measure other aspects of personality, so you do have a choice. Remember, though, that even if you are interested in only one thing, you will need a test that measures multiple aspects of personality. For example, you may be trying to identify assertive people, but it is also useful to know how empathic or socially sensitive they are. After all, being assertive while also being sensitive to others is one thing; being both assertive and insensitive is quite another.
To make their tests more obviously relevant to a business and easier to use, some vendors offer to back-end their test onto your competency framework. This involves aligning their test to the different competencies of your framework and then presenting individuals' results as scores for each of the areas. However, you should do this only if there is a close alignment between the elements that a test measures and your competency framework. All too often we have seen businesses and vendors stretching reality and saying that there is alignment when there is not. And when this happens, the results that you obtain can mislead. Having a test that is accurate but less easy to use can present problems. But having one that is easy to use but inaccurate or misleading is pointless. Personality tests are designed to measure certain things, and there is only so much you can change about them.
Second, you need to look at the outputs available—the type of report that the tests provide. Ideally, you are looking for one that interprets scores for you in easy language that every manager will understand but does not provide too much detail. Our general rule is that a paragraph for each score is fine; a page is too much. Importantly, check to see whether the test has the capacity to evaluate fit with job requirements and business culture. Finally, some reports also provide suggestions for interview questions based on the results of the test. Yet in our experience, these questions tend to be basic and not very effective or useful. So unless your interviewers are particularly inexperienced, we would avoid these “interview” reports.
Finally, you need to think about whether you want a normative test or an ipsative one. The most common type of psychometric is a normative test: typically people rate aspects of their behavior or personality in response to certain questions. At the end, scores are calculated by adding up the responses and, importantly, are compared to a comparison norm group. Normative tests, then, allow you to compare individuals.
The second main type of test, ipsative or forced choice, is very different. People have to choose between different statements: for example, “Which word describes you best: assertive or fun-loving?” After a sufficient number of choices have been made, the test uses some clever mathematics to produce overall scores. Some of the best-known personality tests, including the MBTI, are ipsative.
There are lengthy debates about which type is better, and some providers have tried to produce tests that are a bit of both. The debate continues, but as a general rule, if you want to be able to compare the results of different people (such as in some recruitment situations), you should use only normative tests. Ipsative tests are not designed to do this, so if you try to compare the results of people using them, you will get some strange, invalid, and unreliable results. Knowing what you need to use the test for is important.
Consciously or unconsciously, the vast majority of people try to present themselves in a good way when they are being assessed. With measurement methods in which there is some human contact, such as interviews, assessment centers, and individual psychological assessments, businesses tend to worry less about the impact of faking. This is presumably because they assume that assessors can see through any faking or impression management. Yet with indirect methods such as personality tests, we often hear managers say they are worried that faking could reduce the effectiveness of the tests, since people may appear to be different from how they really are.
The response of some researchers and vendors is that scores on personality tests still predict performance, regardless of whether the results have been achieved through honest answers or faking. Strictly speaking, this is true. But whether you should still be worried depends on the situation in which you are using the personality tests.
If you are using them to assess and sift a large number of candidates, then the researchers are probably right. This is because although a few fakers may slip through, the results will generally still be predictive of performance. However, in selection situations that do not involve large numbers of people, faking may be more of an issue. We thus recommend always talking through the test results with candidates so you have a chance to get a sense of whether the results are accurate.
We have already mentioned the two basic types of situational judgment test: those that ask knowledge-based questions (what the correct answer is) and those that ask behavioral tendency questions (how people typically act). Each has its pros and cons, and both can be effective (see chapter 2).
One clear preference that we have is to use tests that also allow you to assess an individual's level of confidence in her or his answer to each question. The advantage of this is that it allows you to distinguish among four groups of people, each with a different level of training need:
We looked at how to choose, contract, and manage vendors in chapter 8. To choose one specifically to provide IPAs, we suggest you consider four criteria:
Notes
1. Saville, P., MacIver, R., Kurz, R., & Hopton, T. (2008, January). Project Epsom: How valid is a questionnaire? Jersey: Saville Consulting Group. Conference, Stratford-upon-Avon.
2. Cook, M. (2009). Personnel selection: Adding value through people. Chichester, West Sussex: Wiley.
3. Arthur, W., & Bennett, W. (1995). The international assignee: The relative importance of factors perceived to contribute to success. Personnel Psychology, 48, 99–114.
4. Mol, S. T., Born, M. P., Willemsen, M. E., & Van der Molen, H. T. (2005). Predicting expatriate job performance for selection purposes: A quantitative review. Journal of Cross-Cultural Psychology, 36, 590–620.
5. Bhaskar-Shrinivas, P., Harrison, D. A., Shaffer, M. A., & Luk, D. M. (2005). Input-based and time-based models of international adjustment: Meta-analytic evidence and theoretical extensions. Academy of Management Journal, 48, 257–281.
6. Black, J. S., & Stephens, G. K. (1989). The influence of the spouse on American expatriate adjustment and intent to stay in Pacific Rim overseas assignments. Journal of Management, 15(4), 529–544.
7. Arthur & Bennett. (1995).
8. Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities: Findings and implications for personnel selection and theories of job performance. Journal of Applied Psychology, 78, 679–703.
9. Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities: Findings and implications for personnel selection and theories of job performance. Journal of Applied Psychology, 78, 679–703.
10. Salgado, J. F. (2002). The Big Five personality dimensions and counterproductive behaviours. International Journal of Selection and Assessment, 10, 117–125.
11. Van Iddekinge, C. H., Roth, P. L., Raymark, P. H., & Odle-Dusseau, H. N. (2012). The criterion-related validity of integrity tests: An updated meta-analysis. Journal of Applied Psychology, 97(3), 499–530.
12. Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities: Findings and implications for personnel selection and theories of job Performance. Journal of Applied Psychology, 78, 679–703.
13. Berry, C. M., Sackett, P. R., & Wiemann, S. (2007). A review of recent developments in integrity test research. Personnel Psychology, 60, 271–301.
14. Huiras, J., Uggen, C., & McMorris, B. (2000). Career jobs, survival jobs, and employee deviance: A social investment model of workplace misconduct. Sociological Quarterly, 41, 245–263.
15. Hershcovis, M. S., Turner, N., Barling, J., Arnold, K. A., Dupré, K. E., Inness, M., LeBlanc, M. M., & Sivanathan, N. (2007). Predicting workplace aggression: A meta-analysis. Journal of Applied Psychology, 92, 228–238.
16. Cook. (2009).
17. Avolio, B. J., & Waldman, D. A. (1994). Variation in cognitive, perceptual, and psychomotor abilities across the working life span: Examining the effects of race, sex, experience, education, and occupational type. Psychology and Aging, 9, 430–442.
18. Saville et al. (2008).
18.226.169.212