Rater Errors and Bias

A rater error is an error in performance appraisal that reflects consistent biases on the part of the rater. One of the most prominent rater errors is halo error, the tendency to rate similarly across dimensions.33

There are at least two causes of halo error:34 (1) A supervisor may make an overall judgment about a worker and then conform all dimensional ratings to that judgment and/or (2) a supervisor may make all ratings consistent with the worker’s performance level on a dimension that is important to the supervisor. If Nancy rates Luis low on all three performance dimensions (quality of programs written, quantity of programs written, and interpersonal effectiveness) even though his performance on quality and quantity is high, then she has committed a halo error.

Another type of rater error is restriction of range error, which occurs when a manager restricts all of his or her ratings to a small portion of the rating scale. Three different forms of range restriction are common: leniency errors, or restricting ratings to the high portion of the scale; central tendency errors, or using only the middle points of the scale; and severity errors, or using only the low portion of the rating scale.

Suppose that you are an HR manager reviewing the performance ratings given by the company’s supervisors to their subordinates. The question is: How can you tell how accurate these ratings are? In other words, how can you tell what types of rating error, if any, have colored the ratings? It is very difficult to tell. Let us say that a supervisor has given one of her subordinates the highest possible rating on each of five performance dimensions. There are at least three possible explanations. The employee may actually be very good on one of the dimensions and has been rated very high on all because of this (halo error). Or the rater may only use the top part of the scale (leniency error). Or the employee may be a very good all-around worker (accurate). Although sophisticated statistical techniques have been developed to investigate these possibilities, none is practical for most organizations or managers. Further, current research indicates that “errors” in ratings can sufficiently represent “true” ratee performance levels (the “accurate” possibility presented previously), such that rater errors are not good indicators of inaccuracy in rating.35

Personal bias may also cause errors in evaluation. Consciously or unconsciously, a supervisor may systematically rate certain workers lower or higher than others on the basis of race, national origin, sex, age, or other factors. Conscious bias is extremely difficult, if not impossible, to eliminate. Unconscious bias can be overcome once it is brought to the rater’s attention. For example, a supervisor might be unconsciously giving higher evaluations to employees who went to his alma mater. When made aware of this leaning, however, he may correct it.

Blatant, systematic negative biases should be recognized and corrected within the organization. Negative bias became an issue at the U.S. Drug Enforcement Agency (DEA) in the early 1980s when a lawsuit, Segar v. Civiletti, established that African American agents were systematically rated lower than white agents and, thus, were less likely to receive promotions and choice job assignments. The DEA failed to provide supervisors with any written instructions on how to evaluate agents’ performance, and virtually all the supervisors conducting the evaluations were white.36

A major difficulty in performance measurement is ensuring comparability in ratings across raters.37 Comparability refers to the degree to which the performance ratings given by various supervisors in an organization are based on similar standards. In essence, the comparability issue is concerned with whether supervisors use the same measurement yardsticks. What one supervisor considers excellent performance, another may view as only average.

One of the most effective ways to deal with errors and bias is to develop and communicate evaluation standards via frame-of-reference (FOR) training ,38 which uses prepared behavioral examples of performance that a worker might exhibit. After rating the performance presented on video or paper, the trainees in a typical FOR session are told what their ratings should have been. Discussion of which worker behaviors represent each dimension (and why) follows. This process of rating, feedback, and discussion is followed by the presentation of another example. Again, rating, feedback, and discussion follow. The process continues until the appraisers develop a common frame of reference for performance evaluation. In other words, FOR training is all about calibrating everyone to the same performance standards.39

FOR training has consistently been found to increase the accuracy of performance ratings.40 Perhaps even more important, it develops common evaluation standards among supervisors.

The FOR training procedure does have a number of drawbacks, though. One glaring problem is the expense, which can be prohibitive owing to the amount of time and number of people involved. Another drawback is that it can be used only with behaviorally based appraisal systems.

The Influence of Liking

Liking can cause errors in performance appraisals when raters allow their like or dislike of an individual to influence their assessment of that person’s performance. Field studies have found rater liking and performance ratings to be substantially correlated.41 Findings of a correlation might indicate that performance ratings are biased by rater liking. However, good raters may tend to like good performers and dislike poor performers.

The fundamental question, of course, is whether the relationship between liking and performance ratings is appropriate or biased.42 It is appropriate if supervisors like good performers better than poor performers. It is biased if supervisors like or dislike employees for reasons other than their performance and then allow these feelings to contaminate their ratings. It can be difficult to determine if an influence of liking on performance ratings is appropriate or due to bias. Managers may be able to separate their liking from employee performance and, thus, eliminate the possibility that liking biases the performance ratings.43 Nonetheless, most workers appear to believe that their supervisor’s liking for them influences the performance ratings they receive.44 The perception of bias can cause communication problems between workers and supervisors and lower supervisors’ effectiveness in managing performance.

Given the potentially biasing impact of liking, it is critical that supervisors manage their emotional reactions to workers. One approach that may be helpful is to keep a performance diary of observed behavior for each worker45 to serve as the basis for evaluation and other managerial actions. An external record of worker behaviors can dramatically reduce error and bias in ratings.

Recordkeeping should be done routinely—for example, daily or weekly. Keeping records of employee performance is a professional habit worth developing, particularly to safeguard against litigation that challenges the fairness of appraisals.46 To prevent error and bias, the record should reflect what each worker has been doing, not opinions or inferences about the behavior. Further, the record should present a balanced and complete picture by including all performance incidents—positive, negative, or average. A good question to ask yourself is whether someone else reading the record would reach the same conclusion about the level of performance as you have.

In one field study of such recordkeeping, supervisors reported that the task took five minutes or fewer per week.47 More important, the majority of supervisors reported that they would prefer to continue, rather than discontinue, the recording of behavioral incidents. By compiling a weekly record, they did not have to rely much on general impressions and possibly biased memories when conducting appraisals. In addition, the practice signaled workers that appraisal was not a personality contest. Finally, the diaries provided a legal justification for the appraisal process: The supervisor could cite concrete behavioral examples that justified the rating.

Two warnings are in order here. First, performance diaries are not guarantees against bias due to liking, because supervisors can be biased in the type of incidents they choose to record. However, short of intentional misrepresentation, the keeping of such records should help reduce both actual bias and the perception of bias.

Second, it is unfair to keep a secret running list of “offenses” and then suddenly unveil it to the employee when he or she commits an infraction that can’t be overlooked. The message for managers is simple: If an employee’s behavior warrants discussion, the discussion should take place immediately.48

Organizational Politics

Thus far, we have taken a rational perspective on appraisal.49 In other words, we have assumed that the value of each worker’s performance can be estimated. Unlike the rational approach, the political perspective assumes that the value of a worker’s performance depends on the agenda, or goals, of the supervisor.50 Consider the following quote from an executive with extensive experience in evaluating his subordinates:

As a manager, I will use the review process to do what is best for my people and the division. . . . I’ve got a lot of leeway—call it discretion—to use this process in that manner. . . . I’ve used it to get my people better raises in lean years, to kick a guy in the pants if he really needed it, to pick up a guy when he was down or even to tell him that he was no longer welcome here. . . . I believe that most of us here at operate this way regarding appraisals.51

Let’s examine how the rational and the political process differ on various facets of the performance appraisal process.

▪ The goal of appraisal from a rational perspective is accuracy. The goal of appraisal from a political perspective is utility, the maximization of benefits over costs given the context and agenda. The value of performance is relative to the political context and the supervisor’s goals. For example, a supervisor may give a very poor rating to a worker who seems uncommitted in the hopes of shocking that worker into an acceptable level of performance.
▪ The rational approach sees supervisors and workers largely as passive agents in the rating process: Supervisors simply notice and evaluate workers’ performance. Thus, their accuracy is critical. In contrast, the political approach views both supervisors and workers as motivated participants in the measurement process. Workers actively try to influence their evaluations, either directly or indirectly.

The various persuasion techniques that workers use to alter the supervisor’s evaluation are direct forms of influence. For example, just as a student tells a professor that he needs a higher grade to keep his scholarship; a worker might tell her boss that she needs an above-average rating to get a promotion. Indirect influences are behaviors by which workers influence how supervisors notice, interpret, and recall events,52 ranging from flattery to excuses to apologies. The following quote from a consulting group manager demonstrates how employees in the organization used impression-management tactics:53

Phone calls from customers praising a consultant’s performance were rarely received except during the month before appraisals. These phone calls were often instigated by the consultants to highlight their importance.

▪ From a rational perspective, the focus of appraisal is measurement. Supervisors are flesh-and-blood instruments54 who must be carefully trained to measure performance meaningfully. The evaluations are used in decisions about pay raises, promotions, training, and termination. The political perspective sees the focus of appraisal as management, not accurate measurement. Appraisal is not so much a test that should be fair and accurate as it is a management tool with which to reward or discipline workers.
▪ Assessment criteria, the standards used to judge worker performance, also differ between the rational and political approaches. The rational approach holds that a worker’s performance should be defined as clearly as possible. In the political approach, the definition of what is being assessed is left ambiguous so that it can be bent to the current agenda. Thus, ambiguity ensures the necessary flexibility in the appraisal system.
▪ Finally, the decision process differs between the rational and political approaches. In the rational approach, supervisors make dimensional and overall assessments based on specific behaviors they have observed. For instance, Nancy would rate each programmer on each dimension and then combine all the dimensional ratings into an overall evaluation. In the political approach, appropriate assessment of specifics follows the overall assessment. Thus, Nancy would first decide who in her group should get the highest rating (for whatever reason) and then justify that overall assessment by making appropriate dimensional ratings.

Appraisal in most organizations seems to be a political rather than a rational exercise.55 It appears to be used as a tool for serving various and changing agendas; accurate assessment is seldom the real goal. But should the rational approach be abandoned because appraisal is typically political? No! Politically driven assessment may be common, but that does not make it the best approach to assessment.

Accuracy may not be the main goal in organizations, but it is the theoretical ideal behind appraisal.56 Accurate assessment is necessary if feedback, development, and HR decisions are to be based on employees’ actual performance levels. Basing feedback and development on managerial agendas is an unjust treatment of human resources. Careers have been ruined, self-esteem lost, and productivity degraded because of the political use of appraisal. In addition to these negative effects, politically driven appraisal is also associated with increased intention of workers to quit their jobs.57 Such costs are difficult to assess and to ascribe clearly to politics. Nonetheless, they are very real and important for workers.

Individual or Group Focus

If the organization has a team structure, managers need to consider team performance appraisal at two levels: (1) individual contribution to team performance and (2) the performance of the team as a unit.58 To properly assess individual contributions to team performance, managers and employees must have clear performance criteria relating to traits, behaviors, or outcomes. Behavioral measures are typically most appropriate for assessing individual contributions to team performance because they are more easily observed and understood by team members and others who interact with the team.

The individual contribution measures could be developed with the input of team members. However, a good starting point is the set of competencies for individual contribution to team performance identified in recent research.59 The following example describes the use of these competencies at Pfizer, a large pharmaceutical company. Peers assess team members online in the finance area of Pfizer.60

The assessment is based on a four-dimensional model of collaboration, communication, self-management, and decision making. Feedback reports are used as a discussion point to improve the functioning of teams. Over time, there has been significant improvement in the average level of ratings given to team members.

Whatever measures are already in existence or are developed for measuring team performance, here are some points to keep in mind.

First, the measurement system needs to be balanced. For example, although financial objectives may be apparent and easy to develop as criteria, these kinds of objectives may not reflect the concerns of customers.

Another point to keep in mind is that outcome measures may need to be complemented with measures of process. For example, achieving a result may be important but so, too, are interpersonal relations. With a balance of measures, it should be clear to team members that achieving outcomes by running roughshod over peers and customers is not acceptable performance.

Assessing the performance of a team as a unit means that managers must measure performance at the team, not individual, level. Dimensions for measuring team performance may be set at higher levels in the organization; if this is not the case, then team members can be great sources for identifying and developing team-level criteria. Going to team members to help develop criteria encourages their participation in selecting measures that they feel they can directly influence.

Overall, given the individual focus in the United States, it is recommended that individual performance still be assessed, even with a team environment.61 As with individual assessment, there is no consensus as to what type of appraisal instrument should be used for team evaluations. The best approach may include internal and external customers making judgments across both behavioral and outcome criteria.62

Legal Issues

The major legal requirements for performance appraisal systems are set forth in Title VII of the Civil Rights Act of 1964, which prohibits discrimination in all terms and conditions of employment (see Chapter 3). This means that performance appraisal must be free of discrimination at both the individual and group levels. Some courts have also held that performance appraisal systems should meet the same validity standards as selection tests (see Chapter 5). As with selection tests, adverse impact may occur in performance evaluation when members of one group are promoted at a higher rate than members of another group based on their appraisals.

Probably the most significant court test of discrimination in performance appraisal is Brito v. Zia Company, a 1973 U.S. Supreme Court case. In essence, the Court determined that appraisal is legally a test and must, therefore, meet all the legal requirements regarding tests in organizations. In practice, however, court decisions since Brito v. Zia have employed less stringent criteria when assessing charges of discrimination in appraisal.

Appraisal-related court cases since Brito v. Zia suggest that the courts do not wish to rule on whether appraisal systems conform to all accepted professional standards (such as whether employees were allowed to participate in developing the system).63 Rather, they simply want to determine whether discrimination occurred. The essential question is whether individuals who have similar employment situations are treated differently.

The courts look favorably on a system in which a supervisor’s manager reviews appraisals to safeguard against the occurrence of individual bias. In addition, the courts take a positive view of feedback and employee counseling to help improve performance problems. An analysis of 295 court cases involving performance appraisal found judges’ decisions to be favorably influenced by the following additional factors:64

▪ Use of job analysis
▪ Providing written instructions
▪ Allowing employees to review appraisal results
▪ Agreement among multiple raters (if more than one rater was used)
▪ The presence of rater training

In the extreme, a negative performance appraisal may lead to the dismissal of an employee. Management’s right to fire an employee is rooted in a legal doctrine called employment-at-will. Employment-at-will is a very complex legal issue that depends on laws and rulings varying from state to state. We discuss employment-at-will more fully in Chapter 14. Here, the major point is that managers can protect themselves from lawsuits by following good professional practice. If they provide subordinates with honest, accurate, and fair feedback about their performance, and then make decisions consistent with that feedback, they will have nothing to fear from ongoing questions about employment-at-will.

Table of Contents for
Challenges to Effective Performance Measurement

Challenges to Effective Performance Measurement

Rater Errors and Bias

The Influence of Liking

Organizational Politics

Individual or Group Focus

Legal Issues

Table of Contents for Challenges to Effective Performance Measurement

Create new playlist

Sign In

Sign Up

Table of Contents for
Challenges to Effective Performance Measurement