Appendix: Frequently Asked Questions

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

This appendix contains a collection of specific, practical questions that we are commonly asked. There are some big and important issues here, such as how to check whether a competency framework is suitable for assessment and whether it is possible to measure integrity. For the most part, you do not need to have read the rest of the book to understand the answers. But we suggest you look at the definition of validity in chapter 2 if you have not read it yet. The questions we address are:

1. Why are there so many different types of validity?

2. How can I tell if a competency framework is suitable for assessing people?

3. What are the best predictors of success in international assignments?

4. Is it possible to measure integrity?

5. Is drug testing useful?

6. What kind of intelligence test should I use?

7. There are so many different personality tests. Which one is best?

8. Should I be worried about the effect of faking on personality test results?

9. What type of situational judgment test should I use to assess training needs?

10. Four quick questions about 360-degree feedback.

11. What should I look for when choosing a vendor to provide individual psychological assessments?

If you would like to see more further frequently asked questions and answers or pose a question of your own, you can do so at our online blog (www.measuringtalent.com).

1. Why Are There So Many Different Types of Validity?

For the most part, we have tried to avoid using technical language in this book. This is particularly true when it comes to the issue of validity. To keep things simple, we have mainly discussed validity as if it is just one thing: the ability to predict future workplace success. Yet this is only one very specific gauge of validity, and there are others that it might be useful to know about.

We hinted at this in chapter 2 when we described two basic quality checks that vendors should make when developing measures: whether the measure is accurate in its measurements and whether it can help predict things like certain behaviors, events, or the chance that someone will succeed at something.

These checks roughly equate to two big concepts in the world of validity. Accuracy is broadly equivalent to what psychologists call construct validity: the degree to which a measure genuinely assesses what it says it does. So with an intelligence test, does it really measure intelligence? Validity is important because many of the elements that we try to look at in talent measurement are quite subjective. Ask three people for a broad definition of adaptability, and you will get roughly the same answer. But ask them to specify exactly what adaptability involves and what the best questions to evaluate it are, and you are likely to obtain three entirely different answers.

To help ensure that methods and tools do indeed measure what they are supposed to, vendors often look at three specific things:

They check whether the content of a test, such as the questions it asks, is relevant to and capable of capturing what they are trying to measure. For example, with a test of ambition, they need to ensure the questions it contains are indeed about ambition and capture all the different types of ambition. A test developer will thus often give a list of potential questions to a group of experts and ask them to rate which ones are most relevant. The developer can then use statistical techniques to work out which questions are best. This is called content validity.
Vendors can also check the degree to which results on a measure are similar to things that they should be similar to. So if they are developing an intelligence test, they will check whether people who score highly on their test also score highly on other intelligence tests. This is called convergent validity.
They can check whether results on a test are different (or diverge) from things that it is thought they should be different from. For example, do people who score well on tests of integrity have a low incidence of criminal behavior? This is called divergent validity.

The ability of tests to predict certain things, meanwhile, is what psychologists call criterion validity. The thing we usually try to predict is performance, but it can be pretty much anything—from turnover and absenteeism to productivity and promotion. There are three types of criterion validity:

Concurrent validity is whether a measure can predict something else that is measured at around the same time—for example, whether a score on an intelligence test that you take today predicts the performance rating that you receive in this year's appraisal. This is the most common type of criterion validity that is measured.
Predictive validity is what we have focused on in this book. It is how well a measure can predict something in the future, such as performance ratings—so whether a score on an intelligence test that you take today predicts the performance rating that you receive in next year's appraisal.
Incremental validity, as described in chapter 2, is the amount of criterion validity that a measure provides over and above another measure—in other words, how much additional information or validity it provides you with over whatever other measures you are using.

We have focused on predictive criterion validity in this book because it is what businesses tend to be most interested in. But it is worth being aware of the other types of validity, and we have two specific recommendations here:

Ask vendors about construct validity. One interesting and useful question to ask a vendor is how it knows that its measures have construct validity. The answer can tell you a lot about both the test and the vendor and its approach to making sure that its measures are valid.
Distinguish between concurrent and predictive validity. Most measures are validated using concurrent validity. But it is rare to hear a vendor use this phrase, and we have frequently heard vendors say “predictive validity” when they mean “concurrent validity.” If you are told that a measure has been shown to predict performance, check whether this means performance now or in the future. Concurrent validity can be useful, but predictive validity is the higher bar to set when evaluating measures.

Finally, there are a few other validity phrases that you will sometimes hear mentioned:

Face validity is whether something appears to measure what it says it does. You often hear this talked about in relation to participant reactions to various measurement methods.
Faith validity is the tendency to become attached to particular tests or vendors that you are familiar with, at the expense of objectively considering what works best.¹
Mythical validity is how much validity people believe a test has (regardless of the reality).²

2. How Can I Tell If a Competency Framework Is Suitable for Assessing People?

To check whether a framework is suitable for assessing people, we suggest six simple criteria.

1. Whenever possible, competencies should be personal to a business. As we saw in chapter 3, assessments that take into account the specific circumstances and needs of a business are likely to be more accurate in predicting individuals' success in those environments. So the more specific and personal that a competency framework is to your company, business unit, or team, the more effective it is likely to be for measuring talent. This is why we prefer not to use generic competency frameworks owned by vendors. For small businesses that do not have the time, resources, or inclination to develop a framework, using a generic model is better than not using anything. But as a rule, good business-specific competency frameworks are likely to be more useful and to lead to better people decisions than generic ones.

2. Competencies must help distinguish between the good and the great. Businesses often put a lot of thought into which competencies are most relevant to them. Yet they tend to think a lot less about how effective they are at differentiating among people. And to be useful, competencies need to be able to do this. For example, honesty is definitely desirable, but it does not make a good measurement competency. It is common for over 95 percent of employees to be rated highly on it, so it does not help you to identify different types of talent. For similar reasons, we would advise against the current vogue of having a “values” framework. From a measurement point of view, it tends not to add value.

3. There should not be too many competencies. Some vendors use frameworks with fifty or more competencies. For the vendors this can be useful, but for organizations, it is too many. Indeed, in most businesses, it is unrealistic to ask managers to rate any more than a handful of things. This is probably just as well, for research suggests that even trained assessors cannot accurately evaluate more than four or so things at one time. We therefore recommend having no more than six to eight competencies.

4. The framework must have a good life expectancy. Data points need to be consistent not only over business units but also over time. To learn about talent trends, you need to use the same framework over a number of years. One phenomenon that can prove challenging is when the CEO champions a particular competency. The risk here is that when the CEO changes, so does the framework. So although competencies need to reflect current business challenges, they also need to enable continuity of data.

5. Detail is necessary, but not too much. Typically each competency in a framework is accompanied by “behavioral indicators”—examples of what the competency is all about and what “good” looks like. This kind of detail is important to make sure that competencies are rated accurately and consistently. However, it is critical not to have too much detail. We saw one business whose framework was accompanied by a 120-page book. This described in detail, for each competency and level of employee, the behaviors required for particular ratings to be given. From a purist's perspective, this was good practice. Yet from an operational point of view, it was unworkable. Managers do not have time for this level of detail, so keep it simple. For each competency, have a few examples of both good and poor practice. While you may need different indicators for senior- and lower-level roles, you should not need separate ones for each individual level. Some might argue that this lacks rigor and will lead to inaccurate data. Yet managers are more likely to rate accurately using a few simple indicators than when ignoring a hundred-page document.

6. Global frameworks need to be globally applicable. Global companies need to assess competencies across different cultures. The challenge here is whether to assess people in different cultures against local standards or to assess everyone everywhere against common standards. One driver for this is that what is considered good leadership can vary by culture. One option is to use different frameworks in each country. But since the differences among countries tend to be small, we generally advise against this. However, where bigger differences do exist is that some competencies look very different in different cultures. To take a classic example, being able to influence effectively involves different behaviors in the United States than in Japan. To accommodate this, we recommend using slightly different behavior indicators in different countries. This still allows you to collect data on common competencies while also ensuring cross-cultural fairness.

3. What Are the Best Predictors of Success in International Assignments?

International assignments are a common developmental tool and are seen as an opportunity for growth. But they can also be challenging. There is much debate about how high failure rates really are, but almost everyone agrees that careful selection is necessary. So what should you look for?

To begin, there are some specialist tools on the market. For the most part, they measure intercultural competence: the ability to engage with, understand, and operate within other cultures. This certainly sounds useful, and the developers of these tests invariably describe them as valid. Yet you do need to be careful here. What they usually mean by “valid” is that the tests are accurate: they genuinely measure intercultural competence. However, solid evidence that these tools are able to predict success in assignments is largely lacking.

What are the options, then? Prior international experience appears to help, but its ability to predict success is very low. Flexibility and adaptability appear to be more predictive of success, although the research on them is limited.³ The role of the Big Five personality factors has been more often studied (see chapter 2). For the most part, the ability of these factors to predict assignee success has been fairly low. For example, the validities found for conscientiousness are around 0.17, and for emotional stability around 0.10.⁴ However, one aspect of personality that does appear promising in predicting success is relational skills—the ability to build relationships. A combination of extraversion and agreeableness has thus been shown to have predictive validities of around 0.32.⁵ The idea is that the more able people are to build relationships, the more opportunities they will have to engage with and adjust to different cultures. For similar reasons, language skills are also often touted as important. Yet with validities of only around 0.2, they appear to be less predictive than relational skills.

A number of studies have now shown that even more important than the assignee's personal qualities can be the family factor. This is the role of the family, and in particular the assignee's partner, in making the move a success. The burden on the partner is often considerable, and when this aspect of a move fails, the whole assignment can fail too.⁶ Indeed, some studies have suggested that this is the single most important factor in determining success, or at least in avoiding failure.⁷ Many companies seem to focus only on the assignee. But given the research, we recommend involving the whole family in the selection process at the earliest possible stage.

Finally, remember that in looking at personality and the family factor, you should not overlook some of the standard factors used for predicting job success. Aspects like job knowledge and intelligence are just as important for overseas roles as they are for home country ones. To summarize, our recommendation for selecting international assignees is to focus on these:

The family factor
Relational skills
Job knowledge
Intelligence
Adaptability

4. Is It Possible to Measure Integrity?

Tests of people's integrity are nothing new, but they have become popular only in the past twenty years. The trigger for the increased interest in these tests was the introduction of regulations in the United States in 1988 that restricted the use of polygraphs. The annual cost to firms of some employee behaviors such as theft can be considerable, so there was a demand for other ways to identify people who were likely to engage in these behaviors. Enter integrity tests.

The use of these tests was at first mainly limited to measuring the likelihood of theft. Over the years, though, they have begun to be used to predict a broader range of counterproductive work behaviors (CWBs). These include not only theft but also absenteeism, drug use, unsafe behavior, and violence or bullying.

The vast majority of integrity tests available on the market today are psychometrics, and there are two main types.⁸ Overt measures do not disguise their purpose: they ask direct questions about the extent to which people have engaged in illegal or unacceptable behaviors. Covert measures, by contrast, do disguise their purpose and are usually based on standard personality tests. The idea behind these tests is that people with certain types of personality are more likely to engage in CWBs.

Do they work? There is evidence to suggest that they can, with validities of up to 0.4 being reported for their ability to predict some CWBs. This means that at their best, they are more able to predict CWBs than personality tests are able to predict job performance. Some researchers have questioned whether overt tests of integrity in particular can work, since they are easy to fake. But others have argued that regardless of whether some people fake their responses, these tests can still predict CWBs in many people. They have also been shown to be able to reduce incidences of behaviors such as theft in real work situations. So there is some evidence to suggest that in some scenarios, integrity tests can indeed be effective and add value. However, there are some big caveats here.

First, the ability of integrity tests to predict CWBs varies according to the test you use and the specific CWBs you try to predict. For example, some overt tests have been shown to be good predictors of theft, with validities of up to 0.36 reported. Yet overt tests tend to be fairly poor predictors of absenteeism, with validities of around 0.14.⁹ Similarly, the personality dimension of conscientiousness has been shown to be quite predictive of CWBs aimed at the organization, such as theft. Yet it is far less effective at predicting CWBs aimed at individuals, such as antisocial behavior.¹⁰

Second, the research that has been done to date on integrity tests mainly relates to moderate- to low-level jobs. There is a notable lack of evidence that these tests can be successfully used with more senior roles and more complex types of CWBs. For instance, we recently advised a financial trading business against using an integrity test as part of its selection processes for hiring traders. There is simply not enough evidence that they work in these situations.

Third, there can be some practical issues with integrity tests, such as whether and how to exclude candidates purely on the basis of their integrity test results. The problem here is that all integrity tests will inevitably produce a number of false positives. These are people who do poorly on the test but do not subsequently go on to demonstrate any CWBs. In fact, they can go on to be model corporate citizens. For this reason, integrity tests are far better used as an indicator for further investigation (for example, in an interview) than as a pass-or-fail type of test.

Finally, one aspect to be wary of when deciding whether to use integrity tests is reporting bias. Validity studies conducted by vendors selling integrity tests tend to report much higher validities than independent research into the efficacy of these tests.¹¹ So what does all this mean for companies considering using integrity tests? We have three recommendations:

Be clear about which specific CWBs you want to measure. This is important so that you can make sure that the test you use is capable of doing precisely what you need it to do. In addition, an integrity test is usually more able to predict a specific CWB than a general tendency toward CWBs overall. Covert, personality-based integrity tests, for example, have been shown to predict CWBs overall with a fairly low validity of around 0.22.¹² Yet when we look at the ability of specific personality dimensions to predict particular CWBs, we can get validities of up to 0.36.¹³ When considering which test to use, then, do not just ask, “How valid is it?” Ask, “How able is it to predict this specific behavior?”
Do not use integrity tests to try to measure complex CWBs in senior-level people. There is not sufficient evidence that they can do this effectively, and more senior-level people tend to have less patience with having their integrity measured.
Ask integrity test vendors if there has been any independent research into the validity of their tests. Also, check some of the independent review sources, such as the Buros Institute (see chapter 5).

One final thing to bear in mind is that the causes of CWBs do not lie only within individuals. Contextual factors have also been shown to be important and, sometimes, to be even better predictors of CWBs than traditional integrity tests. For example, the greater the level of fit there is between a role and an individual's career goals, the less chance there is he or she will engage in CWBs.¹⁴ And low levels of job satisfaction have been shown to be more related to CWBs than many tests of integrity.¹⁵ More research is required, but as with trying to predict job performance, the answer is unlikely to lie only within individuals.

5. Is Drug Testing Useful?

The answer depends on what you are trying to achieve. If you are trying to assess who takes illegal drugs (or at least fails to hide the fact that they do), then yes it can be. If you are trying to predict absence, then again, yes it can be. But if you are trying to predict accident proneness, then it is probably not useful. And if you are trying to predict performance, it is almost definitely not.¹⁶

6. What Kind of Intelligence Test Should I Use?

There are two main ways of thinking about the different types of intelligent tests. The first way, which you may already be familiar with, is according to the types of questions asked. The most common forms of questions are numerical, verbal (which test someone's ability to work with language), logical, diagrammatic, and mechanical. Some tests use only one type of question; others use two or three different types. No one type is best.

As for which to use, our recommendation would be to choose the two or three that are most relevant to the roles you are assessing. We say at least two because although in general people's scores on one type of question predict their scores on the other types, this is not always so. Sometimes people do excellently on, say, a verbal test but poorly on a mathematical one. One additional thing to bear in mind here is that the amount of adverse impact created by intelligence tests sometimes differs for each type of question. This is particularly so for people of different genders and ages. Men, for example, tend to do better on numerical and diagrammatic questions, whereas women tend to do better on verbal ones. And our ability to do verbal tests seems to decline more slowly with age than our ability to do other types of tests.¹⁷

The second way to think about what type of test you need is to consider whether it is a speeded test or a power test. One of the best examples of a speeded test is the Wonderlic. This is a well-known US test that in its classic form requires people to answer fifty-two verbal and numerical questions in 12 minutes. This means, on average, they need to answer one question every 13.85 seconds. That's speeded. At the other end of the scale are power tests. They typically last 20 to 40 minutes and ask only one type of question. Each question requires more thought and takes longer to answer than in a speeded test. As for which you should use, we again recommend thinking about the requirements of the role. Is quick thinking an important part of it? And again, adverse impact can be an issue, since there are reports that speeded tests can show greater levels of adverse impact.

Finally, there are some tests available that measure very specific aspects of thinking. One of the best known is the Vienna Test System, which involves over thirty separate tests. It can measure things like coordination and the ability to divide or sustain attention or to switch between tasks. They are mostly used for specialized roles, such as for pilots, but we have also seen them used with financial traders.

7. There Are So Many Different Personality Tests. Which One Is Best?

The obvious place to start is with validity. Unfortunately, though, there are few studies that directly compare the validity of different personality tests. One notable, albeit not independent, exception is Saville Consulting's recent Project Epsom, which compared the validity of four tests on a group of three hundred people and is freely available on the Internet.¹⁸ In addition, as we described in chapter 5, the Buros Institute's test reviews can be a valuable resource, although it reviews tools individually rather than comparing different ones.

So validity is clearly one thing you need to consider. But there are three other things as well. First, check whether a personality test measures what you need it to. You may, for example, be particularly interested in measuring assertiveness, in which case you need a test that does just this. Some tests measure the Big Five personality dimensions, but many tests measure other aspects of personality, so you do have a choice. Remember, though, that even if you are interested in only one thing, you will need a test that measures multiple aspects of personality. For example, you may be trying to identify assertive people, but it is also useful to know how empathic or socially sensitive they are. After all, being assertive while also being sensitive to others is one thing; being both assertive and insensitive is quite another.

To make their tests more obviously relevant to a business and easier to use, some vendors offer to back-end their test onto your competency framework. This involves aligning their test to the different competencies of your framework and then presenting individuals' results as scores for each of the areas. However, you should do this only if there is a close alignment between the elements that a test measures and your competency framework. All too often we have seen businesses and vendors stretching reality and saying that there is alignment when there is not. And when this happens, the results that you obtain can mislead. Having a test that is accurate but less easy to use can present problems. But having one that is easy to use but inaccurate or misleading is pointless. Personality tests are designed to measure certain things, and there is only so much you can change about them.

Second, you need to look at the outputs available—the type of report that the tests provide. Ideally, you are looking for one that interprets scores for you in easy language that every manager will understand but does not provide too much detail. Our general rule is that a paragraph for each score is fine; a page is too much. Importantly, check to see whether the test has the capacity to evaluate fit with job requirements and business culture. Finally, some reports also provide suggestions for interview questions based on the results of the test. Yet in our experience, these questions tend to be basic and not very effective or useful. So unless your interviewers are particularly inexperienced, we would avoid these “interview” reports.

Finally, you need to think about whether you want a normative test or an ipsative one. The most common type of psychometric is a normative test: typically people rate aspects of their behavior or personality in response to certain questions. At the end, scores are calculated by adding up the responses and, importantly, are compared to a comparison norm group. Normative tests, then, allow you to compare individuals.

The second main type of test, ipsative or forced choice, is very different. People have to choose between different statements: for example, “Which word describes you best: assertive or fun-loving?” After a sufficient number of choices have been made, the test uses some clever mathematics to produce overall scores. Some of the best-known personality tests, including the MBTI, are ipsative.

There are lengthy debates about which type is better, and some providers have tried to produce tests that are a bit of both. The debate continues, but as a general rule, if you want to be able to compare the results of different people (such as in some recruitment situations), you should use only normative tests. Ipsative tests are not designed to do this, so if you try to compare the results of people using them, you will get some strange, invalid, and unreliable results. Knowing what you need to use the test for is important.

8. Should I Be Worried About the Effect of Faking on Personality Test Results?

Consciously or unconsciously, the vast majority of people try to present themselves in a good way when they are being assessed. With measurement methods in which there is some human contact, such as interviews, assessment centers, and individual psychological assessments, businesses tend to worry less about the impact of faking. This is presumably because they assume that assessors can see through any faking or impression management. Yet with indirect methods such as personality tests, we often hear managers say they are worried that faking could reduce the effectiveness of the tests, since people may appear to be different from how they really are.

The response of some researchers and vendors is that scores on personality tests still predict performance, regardless of whether the results have been achieved through honest answers or faking. Strictly speaking, this is true. But whether you should still be worried depends on the situation in which you are using the personality tests.

If you are using them to assess and sift a large number of candidates, then the researchers are probably right. This is because although a few fakers may slip through, the results will generally still be predictive of performance. However, in selection situations that do not involve large numbers of people, faking may be more of an issue. We thus recommend always talking through the test results with candidates so you have a chance to get a sense of whether the results are accurate.

9. What Type of Situational Judgment Test Should I Use to Assess Training Needs?

We have already mentioned the two basic types of situational judgment test: those that ask knowledge-based questions (what the correct answer is) and those that ask behavioral tendency questions (how people typically act). Each has its pros and cons, and both can be effective (see chapter 2).

One clear preference that we have is to use tests that also allow you to assess an individual's level of confidence in her or his answer to each question. The advantage of this is that it allows you to distinguish among four groups of people, each with a different level of training need:

Those who get the correct answer and are confident about it. These people just need to be told they got it right.
Those who get the correct answer but are not confident about it. These people probably guessed a bit and need to be told that they got it right, plus maybe reminded of some of the reasons it was the correct answer.
Those who get the wrong answer and are not confident about it. Again, these people probably guessed and need to be told what the correct answer is and why.
Those who get the wrong answer but are confident about it. These are the people you need to target most. They need to be told why they were wrong and trained or educated in the correct response.

10. Four Quick Questions About 360-Degree Feedback

1. Should we buy a generic off-the-shelf 360 or create our own? Generally you should consider developing your own: it is not difficult and can be quick and cheap to do. However, there are two situations in which it does make sense to use a generic test. First, if you plan to use the test only a few times or with only a few people, investing in creating your own test is not likely to be justified. Second, if you plan to use the test quite often, the one scenario in which using generic tests can still be better is if your business does not have its own competency framework and you can find a generic test that assesses what you need it to.

2. Do the people presenting individuals with their 360-degree feedback results always need training? No. They will probably do a better job if they have received training, but it is rarely absolutely necessary. This is an important issue because the quality of the conversation that occurs when people receive their 360 report can play a major role in determining whether the feedback leads to any performance improvement. And if handled poorly, it is true that—like any other feedback or appraisal conversation—it can have a negative impact on the recipient. So training is generally preferable, and we strongly urge companies to think about the skills of the individuals running these feedback conversations. But it is not absolutely necessary. This is why we recommend that you do not use vendors that insist on training and “certifying” people to be able to run these feedback conversations: not everyone may need it.

3. What is the ideal number of feedback givers for a 360? The consensus is six to eight people, but it is important not to go much beyond this number. You may receive more written comments, but the average ratings for each category of feedback giver can become meaningless. This is because if you combine the ratings of too many people, the result is a lot of average overall scores, with no real highs or lows.

4. Whose ratings are the most predictive of success? The common answer you will hear is that managers' ratings are the most accurate, followed by peers' and then direct reports'. Self-ratings and customer ratings do not appear to be very accurate. However, there are two caveats:

Although peers' ratings can clearly be useful and accurate, there is a lot of variation in how valid they have been found to be. For example, when there is little opportunity to observe the day-to-day performance of the individual, ratings tend to be less valid. Likewise, when the 360 is part of an appraisal or selection process, ratings tend to be less valid. Moreover, in our experience, peer ratings tend to reflect how agreeable and liked an individual is as much as anything else.
Direct reports' ratings tend to be a bit high and have been found to be unreliable at distinguishing between high and low performers. However, they have been found to be predictive of team effectiveness, and there are some managerial behaviors that direct reports are well positioned to rate, so their ratings should never be ignored.

11. What Should I Look for When Choosing a Vendor to Provide Individual Psychological Assessments?

We looked at how to choose, contract, and manage vendors in chapter 8. To choose one specifically to provide IPAs, we suggest you consider four criteria:

Cost and coverage. How much will the vendor charge, and does it have consultants in all the locations that you want to assess people in? If not, charges could rise rapidly with travel costs. There are also two less obvious issues you should look at here. First, can the vendor cover all the languages of your employees, even in your home locations? Wherever possible, people should be assessed in their first language, and you may well have foreign nationals working in your head office. Second, check whether the consultants in locations abroad are locals or expatriates. This is particularly an issue in Africa, the Middle East, and Asia-Pacific, where it is common for assessors to be individuals with little knowledge of the local culture.
Quality of the assessors. As we explained in chapter 4, individual psychological assessment can be accurate, but it is totally reliant on the quality of the assessor, so you need to check this. How do you tell if an assessor is good? Start by asking the vendor who its best assessors are and how it knows this. Good vendors should have some data to show you—and on something more than simply the level of experience of the assessors. The answer can be interesting since it can show you what the vendor values in its assessors and how it evaluates their performance. Next, ask to see the résumés of all the assessors who would do assessments for you. It is important here that you see the real bench strength of the vendor, not just its top people. You are looking for two things in assessors: their level of experience as an assessor and their level of real business experience. Ideally, you want assessors who have both.
Quality of the outputs. This refers to how much you like the written reports that the vendor provides. All vendors will be able to provide you with an anonymized sample report. You should be looking for a good balance of ratings and descriptive text. Typically they provide competency ratings. You should also look for ratings of fit and—in selection processes—recommendation ratings (for instance, “hire” or “do not hire”). Some vendors do not like to give clear recommendation ratings. They believe that such definite judgments can be unreliable and can undermine the hiring manager's ownership of the hiring decision. Although we respect this position and appreciate that these judgments are difficult, we expect them of managers and so expect an opinion from paid expert assessors, too. Moreover, we do not believe that recommendation ratings undermine managers' ownership of the final decision. If anything, they accentuate managers' accountability for these decisions.
In terms of the text, beware of simple statements such as, “He is a good manager.” Instead, remember the Four Cs from chapter 3 and look for statements such as, “He is an effective manager in this way when … but less so when …” Read the concluding or summary section to see if it provides a clear description, and, finally, look at the “development needs” or “weaknesses” section. Here you are looking for three things: each issue should be clearly described, the relevance of the issue for performance must be made clear, and a suggestion should be given for how to improve in the area.

Quality of the calibration. This is the process that vendors use to make sure that their assessors all rate people against the same standards. It is how they check that when one assessor gives a rating of 5, it means the same as when their other assessors do so. This is an essential process because it is how vendors can minimize the impact of rating biases. We would expect at a minimum to hear two things: that all reports are read and checked by a second person, who will read all the reports that the vendor provides for you, and that the vendor monitors (and will provide you with) information on things such as each assessor's rating tendencies. This is to make sure that, for example, a particular assessor is not always a bit harder in his or her ratings. Finally, we have occasionally come across some vendors that have tried to levy an extra charge for such calibration. To us, this feels wrong. Call us old-fashioned, but we firmly believe that vendors should not be charging extra for assuring you of the quality of their work.

Notes

1. Saville, P., MacIver, R., Kurz, R., & Hopton, T. (2008, January). Project Epsom: How valid is a questionnaire? Jersey: Saville Consulting Group. Conference, Stratford-upon-Avon.

2. Cook, M. (2009). Personnel selection: Adding value through people. Chichester, West Sussex: Wiley.

3. Arthur, W., & Bennett, W. (1995). The international assignee: The relative importance of factors perceived to contribute to success. Personnel Psychology, 48, 99–114.

4. Mol, S. T., Born, M. P., Willemsen, M. E., & Van der Molen, H. T. (2005). Predicting expatriate job performance for selection purposes: A quantitative review. Journal of Cross-Cultural Psychology, 36, 590–620.

5. Bhaskar-Shrinivas, P., Harrison, D. A., Shaffer, M. A., & Luk, D. M. (2005). Input-based and time-based models of international adjustment: Meta-analytic evidence and theoretical extensions. Academy of Management Journal, 48, 257–281.

6. Black, J. S., & Stephens, G. K. (1989). The influence of the spouse on American expatriate adjustment and intent to stay in Pacific Rim overseas assignments. Journal of Management, 15(4), 529–544.

7. Arthur & Bennett. (1995).

8. Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities: Findings and implications for personnel selection and theories of job performance. Journal of Applied Psychology, 78, 679–703.

9. Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities: Findings and implications for personnel selection and theories of job performance. Journal of Applied Psychology, 78, 679–703.

10. Salgado, J. F. (2002). The Big Five personality dimensions and counterproductive behaviours. International Journal of Selection and Assessment, 10, 117–125.

11. Van Iddekinge, C. H., Roth, P. L., Raymark, P. H., & Odle-Dusseau, H. N. (2012). The criterion-related validity of integrity tests: An updated meta-analysis. Journal of Applied Psychology, 97(3), 499–530.

12. Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities: Findings and implications for personnel selection and theories of job Performance. Journal of Applied Psychology, 78, 679–703.

13. Berry, C. M., Sackett, P. R., & Wiemann, S. (2007). A review of recent developments in integrity test research. Personnel Psychology, 60, 271–301.

14. Huiras, J., Uggen, C., & McMorris, B. (2000). Career jobs, survival jobs, and employee deviance: A social investment model of workplace misconduct. Sociological Quarterly, 41, 245–263.

15. Hershcovis, M. S., Turner, N., Barling, J., Arnold, K. A., Dupré, K. E., Inness, M., LeBlanc, M. M., & Sivanathan, N. (2007). Predicting workplace aggression: A meta-analysis. Journal of Applied Psychology, 92, 228–238.

16. Cook. (2009).

17. Avolio, B. J., & Waldman, D. A. (1994). Variation in cognitive, perceptual, and psychomotor abilities across the working life span: Examining the effects of race, sex, experience, education, and occupational type. Psychology and Aging, 9, 430–442.

18. Saville et al. (2008).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Appendix: Frequently Asked Questions

Create new playlist

Sign In

Sign Up