7. Displaying Reputation

How to Use a Reputation: Three Questions

For each reputation you are creating to display or use, you should ask each of these questions before proceeding:

Who will be able to see the reputation?
- Is it personal—hidden from other users but visible to the reputation holder?
- Is it public—displayed to friends or strangers, or visible to search engines?
- Is it corporate—limited to internal use—for improving the site or discreetly recognizing outliers in ways that may not be visible to the community?
How will the reputation be used to modify your site’s output?
- Will you use the reputation to filter the lowest- or highest-quality items in a set?
- Will you use the reputation to sort or rank items?
- And/or will this score be used to make other decisions about how the site flows or your business operates?
Is this reputation for a content item or a person? Each requires a fundamentally different approach.

Though you may choose multiple answers from this list for each reputation, try to keep it simple at first: don’t try to do too much with a single reputation. Confounding the purposes of a reputation—by, for example, surfacing participation points in a public karma score—can encourage undesirable user behavior and may even backfire by discouraging participation. Read Chapters 7 and 8 completely for a solid understanding of the issues related to overloading a single reputation.

Caution

Resist the temptation to treat a single reputation score as the cure-all for your user-generated content incentive ills. Remember the lesson of the FICO score in FICO: A Study in Global Reputation and Its Challenges.

Who Will See a Reputation?

So far, the reputation you’re calculating is little more than a cold numerical score rolled up from the aggregate actions of people interacting with your site. You’ve carefully determined the scope of the reputation, chosen the inputs that contribute to it, and thought at length about the effect that you want the reputation to generate in the community.

Now you must decide whether it makes sense to display the reputation on your site at all and, if so, to whom. How you display reputation information—how much and how prominently—will influence the actions that users take on your site, their trust in your site and one another, and their long-term satisfaction with your community.

To Show or Not to Show?

Compelling reasons exist to keep reputations hidden from users. In fact, in some circumstances, you may want to obscure the fact that you’re tracking them at all. It may sound rather Machiavellian, but the truth of the matter is this: a community under public scrutiny behaves differently (and, in many ways, less honestly) than one in blissful ignorance.

Several trade-offs are involved. Displaying reputations takes up significant page real estate, requires user interface design and testing, and can compete with your content for the user’s attention and understanding. Quickly, show Digg.com (Figure 7-1) to 10 of your friends and ask them, “What kind of site is this? News? Entertainment? Community?” Odds are good that at least a few of them will answer: “This appears to be some sort of contest.”

Figure 7-1. Digg’s site design puts overt reputation scores front and center.

The impression that Digg makes is not a bad thing; it just demonstrates that Digg made a conscious decision to display content reputation prominently. In fact, the display of reputation is the central interaction mechanism on the site. It’s practically impossible to interact with Digg, or get any use out of it, without some understanding of how community voting affects the selection and display of popular items on the site. (Digg is perhaps the most well-known example of a site that employs the Vote-to-Promote pattern. See Chapter 6.)

Juxtapose Digg’s approach with that of Flickr. The popular photo-sharing and discovery service also makes use of reputation to surface quality content, but it does not display explicit reputations, rather it prominently displays items that achieve a certain reputation and that can be browsed (daily, weekly, or monthly) in the “Explore” gallery (at http://www.flickr.com/explore); see Figure 7-2. The result is a very consistent and impressive display of high-quality photos with very little indication of how those photos were selected.

Figure 7-2. Flickr’s “Explore” gallery is also based on reputation, but you never see a score associated with a photo.

Flickr’s interestingness algorithm determines which photos make it into the “Explore” gallery and which don’t. The same algorithm lets users sort their own photos by interestingness.

Digg and Flickr represent two very different approaches to reputation display, but the results are very much the same. Theoretically, you can always glance at the front page of Digg or Flickr’s “Explore” gallery to see where the good stuff is—what people are watching, commenting on, or interacting with the most on the site.

How do you decide whether to display reputations on your site? And how prominently? Generally, follow the rule of least disclosure: do not display a reputation that doesn’t add specific value to the objects being evaluated.

Likewise, don’t bother asking users for reputation input (see Chapter 6) that you’ll never use; you’ll confuse users and encourage undesired patterns of “invented significance,” including abuse.

Caution

Avoid collecting reputation for display only. Orkut allowed users to rate other users explicitly on iconic criteria like “trusty,” “cool,” and “sexy” for no use other than display. This use of reputation caused all kinds of social backlash.

People were either disappointed that they weren’t rated “cool” by more people, or they were creeped out by people of the same gender calling them sexy. Eventually, Orkut removed the display of individual friends’ ratings and kept only the aggregate scores.

Irrelevant reputations are meaningless and consume valuable resources. If you don’t have a relevant use for a reputation, beware of sticking yourself later with the tough choice of either awkwardly removing a failed feature or having to support it as a costly legacy element.

Personal Reputations: For the Owner’s Eyes Only

Are you tracking a reputation primarily to keep users informed about how well they or their creations are performing in the community? Consider displaying that reputation only to its owner, as a personal communication between site and user.

We use the word personal very deliberately here, distinguishing it from private. No reputation system is truly private; at least one other party (typically the site operator) will almost always have access to the actions, inputs, and roll-ups that formulate a user’s score. In fact, you may store internally used reputations (see Corporate Reputations Are Internal Use Only: Keep Them Hush-hush) that are largely based on the exact same data.

In other words, reputations may be displayed in a personal context, but that’s no guarantee that they’re private. As a service provider, you should acknowledge that distinction and account for it in your terms of service.

Personal reputations are used extensively for applications such as social bookmarking, lists of favorites, training recommendation systems, sorting and filtering news feeds, providing content quality and feedback, fine-grained experience point tracking, and other performance metrics. Most of the same user interface patterns for displaying public reputation apply to personal ones, too, but take care to ensure that each user knows when her reputations will and will not be displayed to others.

Tip

Keep a reputation personal when its owner gains some significant benefit from it—when it either improves his experience of the site (that is, personalizes it) or provides a tool for increasing self-satisfaction. For example, by selecting news stories about various sports teams over time, a user might generate a geographic region reputation that can be used to target advertising displayed to the user. Clearly that reputation should not be public information, but it might be surfaced privately so that the user can correct it—“I’m a fan of Northern California sports teams, but I’m going to MIT and I really want ads for electronics stores in the Boston area.”

Google Analytics (see Figure 7-3) is an example of rich personal reputation information. It provides detailed information about the performance of your website, across a known range of score types, and it is available only to you, the site owner (or others to whom you grant access). While that information is invaluable to you in gauging the response of a community (in this case, the entire Web) to your content, exposing it to everyone would offer very little practical benefit. In fact, it would be a horrible idea.

Figure 7-3. Google’s Analytics interface shows information that is clearly best kept between you and Google. It’s personal.

Personal and Public Reputations Combined

Some reputation display patterns provide both a personal and a public representation. In the named-levels display pattern Named levels the personal representation of the reputation score often is numeric, showing the exact score, whereas the public representation obscures exactly where in the level the target’s score actually is. Online games usually report only the level to other users and the exact experience points to the player.

Public Reputations: Widely Visible

When the whole community would benefit from knowing the reputations of either people or content, consider displaying public reputations. Public reputations may be displayed to everyone in the community or only to users who are members of a group, are connected through a social network, or have achieved status as senior, trusted members of the community by surpassing some reputation threshold.

When is it a good idea to display public reputations? Remember our original definition: reputation is information used to make a value judgment about a person or an object in a given context for a specific time. Consider the following questions:

What decisions am I asking users to make on my site?
- Compare items’ quality against one another?
- Determine someone’s credibility or trustworthiness?
- Decide whether something’s worth reading?
Am I asking users to make time-sensitive decisions or decisions in which additional, well-placed information would save them heartache?
Can I present the reputation in a way that is fair and comprehensible and doesn’t overwhelm the presentation of the content?

Public reputations are used for hundreds of purposes on the Web: to compare items in a list on the basis of community member feedback, evaluate particular targets for online transaction trustworthiness, filter and display top-rated message board posts, rank the best local Indonesian restaurants, show today’s gallery of the most interesting photos, to display leaderboards of the top-scoring reputation targets, and much more.

Over time, public reputations can evolve to represent your community’s understanding of its own zeitgeist. And there’s the rub: depending on how you use public reputation, you can alienate users who aren’t part of the in crowd. For example, Yelp is all about public ratings and reviews of local restaurants, but it isn’t used extensively by people over 50. Most of the reviews are written by twentysomethings (most “Yelpers” are between the ages of 26 and 35) who seem to be mostly interested in a restaurant’s potential as a dating hangout.

Tip

Public reputations are helpful for allowing users to compare like items. Public karma reputations also serve as an effective extension of a person’s identity.

Corporate Reputations Are Internal Use Only: Keep Them Hush-hush

Almost every website with a large volume of user-generated content is using hidden reputation scores internally—as a means of tracking exactly who is saying what about a content item or another user:

When users click the Spam button in a webmail application, they contribute to a database of IP addresses for abusive mail servers.
Web crawlers constantly scan the Web to examine what sites link to what other sites and to calculate a hidden score such as Google’s PageRank.
Yahoo! Answers tracks corporate reputation for users who are particularly good at identifying bad content and gives them more power to hide bad content quickly.

And internally used reputation scores need not always be acted on immediately by scripts or bots; they can also be a very helpful tool for human decision making. Community managers often use corporate reputation reports on the most active, connected, and highest-quality user contributions and creators. They might use the information to generate high-quality best-of galleries to promote a site, or they might invite top contributors to participate in early testing of new designs, products, or features. Finally, user actions often are aggregated into reputations for behavioral targeting of advertising, customer care planning and budgeting, product feature needs assessment, and even legal compliance.

Tip

Even if your site wouldn’t benefit from any public or personal form of reputation display, you probably need to track corporate (internal) reputation scores to understand what your users are doing, tune your site development, and optimize support costs.

How Will You Use Reputation to Modify Your Site’s Output?

After deciding which reputation scores to display to whom, you’ll need to decide how to use the scores to change the way your application works. It’s easy to think that all you need to do is display a few stars here or a few points there—but if you stopped there, you wouldn’t capture the most value from reputation.

To use reputation without displaying it, focus on how to identify the outlying reputable entities (users and content) to improve the quantity and quality of interaction on your site. When you’re selecting patterns, review the goals you set for your system (see Chapter 5). If you’re primarily concerned about identifying abusive behavior, focus on filtering and decisions. If you’re going to display a lot of public reputation over many entities, focus on ranking and sorting to help users explore your content.

We’ll cover patterns for making use of the reputation of entities in Chapter 8.

Reputation Filtering

At its simplest, filtering consists of sorting by one or more reputation dimensions and looking only at the first or last entries in the list to identify the highest and lowest scoring entities for further, even automatic, action. In reality, many reputations used for filtering are often made of more numerous and complex inputs than reputations built for public display in rankings or sorted lists.

Consider Flickr’s interestingness filter reputation: it is corporate (used internally and not displayed to any user), it is complex (made up of many inputs: views, favorites, comments, and more), and it is used to automatically and continuously generate a public gallery. But the score is never displayed to users; you cannot query a photo to get its interestingness score. Perhaps the easiest way to think about a filter reputation is that, if it is not ever displayed to users, they don’t have to understand what it’s made up of. If users can see a reputation indicator, they’ll want to know what it means and how it’s calculated.

Tip

In fact, algorithm speculation has become almost a spectator sport on the Web. Name any popular reputation-heavy site (Digg, Amazon, YouTube, and many others), and odds are good that you’ll find any number of threads or forums dedicated to figuring out exactly how its algorithm works.

The reputation usage patterns related to filtering are: user threshold, public gallery, guided learning, recommendations, bookmarks/favorites, similar items, content by author karma, and friends filtering.

Reputation Ranking and Sorting

By far the most common displays of reputation are in the form of explicit lists of reputable entities, such as the restaurants in the local neighborhood with the highest average overall rating, or the list of players with the highest Elo ranks for chess or even which keyword search marketing terms are generating the most clicks per dollar spent.

Typically, the reputation score is used alone or in conjunction with other metadata filters, such as geographic location, to make it easy for users to sort between multiple entities at a glance. For example, to list top-rated hotels in a five mile radius of a zip code, one would combine the distance and reputation into a rank-score before displaying the list.

The primary purpose of allowing such sorting is to enable users to select an item to examine in more detail. Note that the reputation score need not be displayed to allow sorting or ranking entities. For example, to avoid encouraging abuse, public search engines typically hide search ranking scores.

Caution

Any time you sort or rank reputable entities, you’re helping users to sort data into the good and the bad. This is creating value—and wherever value exists, people will be interested in capturing as much of it as possible using whatever means are available. The more successful your reputation ranking is, the more value it creates, and the more some people will want to game your design for their own benefit.

The lesson is a reputation-based display that may work well when a community is small may need to be modified over time, as it becomes more successful. This is a success paradox: the more popular your reputation system becomes, the more likely you’ll see reputation abuse. Keep an eye out for use patterns that don’t contribute to your business and community goals.

Recommender systems use reputation to make suggestions about similarities between user tastes (“People who like the same things as you do you also like…”) and discover taste similarities between items (“People who liked this item also like…”). They use reputation in the form of confidence scores and typically display multiple entities in rank order when making recommendations. When the user selects a suggested item, that selection itself is also entered in the reputation system to further improve the quality of future results.

The specific reputation usage patterns related to ranking and sorting are quality-sort search results, leaderboards, related items, recommendations, search relevance (such as Google’s PageRank), corporate community health metrics, and advertising performance metrics.

Reputation Decisions

This entire class of use patterns often is overlooked because it typically happens behind the scenes, out of users’ sight. Though you may not be aware of it, more hidden decisions are made on the basis of reputation than are actually reflected directly to users, either with filtering or ranking.

Billions of email messages are processed daily across the world. ISPs secretly track the IP addresses of the senders; they use this reputation to decide whether the item should be dropped, put in a bulk folder, or sent on to another content-based reputation check before being delivered to your inbox. This is only one example of many patterns used by Web 2.0 site operators around the world to manage user-generated content without exposing the scores or the methods for their calculations. When used for abuse mitigation, the value of the reputation score can be directly correlated with cost savings from increased efficiency in customer care and community management, as well as in hardware and other operational costs. Each year, the IP reputation system for Yahoo! Mail saves tens of millions of dollars in real costs for servers, storage, and overhead.

When a reputation score is complex, such as karma (see the next section), it may be suitable for public display as a standalone score so that others can make specific, context-sensitive decisions. eBay’s feedback and other reputation scores are a good example of a publicly shared karma. Since the transactions for items are often one of a kind, content filtering and ranking don’t provide enough information for anyone to make a decision about whether to trust the seller or buyer.

Of course, some reputation is nonnumeric and can’t be ranked at all—for example, comments, reviews, video responses, and personal metadata associated with source users who evaluate your entities. These forms of input must be displayed so that users can interpret the input directly. For instance, a 20-year-old single woman in Los Angeles who is looking for a new sweater might want to discount the ratings given by a 50-year-old married man living in Alaska. Nonnumeric reputation often provides just enough additional context for people to make more informed judgments about entities.

Here are the specific reputation usage patterns related to decisions: critical threshold, automatic rejection, flag for moderation, flag for promotion, and reviews and comments.

Content Reputation Is Very Different from Karma

Reputable entity refers to everything in a database, including users and content items, with one or more reputations attached to it. All kinds of reputation score types and all kinds of displays and use patterns might seem equally valid for content reputation and karma, but usually they’re not. To highlight the differences between content reputation and karma, we’ve categorized them by the ways in which they’re typically calculated:

Simple reputation: Simple reputation is any reputation score that is generated directly by user evaluation of a reputable entity and that is subject to an elementary aggregation calculation, such as simple average. For example, simple reputation is used on most ratings-and-reviews sites. Simple reputation is direct and easy to understand.
Complex reputation: Complex reputation is a score aggregated from multiple evaluations, including evaluations of different but related targets, calculated with an opaque method. Email IP spammer, Google PageRank, and eBay feedback reputations are examples of complex reputation. It’s an indirect evaluation, and users may not understand how it was calculated, even if the score is displayed.

Content Reputation

Content reputation scores may be simple or complex. The simpler the score is—that is, the more it directly reflects the opinions or values of users—the more ways you can consider using and presenting it. You can use them for filters, sorting, ranking, and in many kinds of corporate and personalization applications. On most sites, content reputation does the heavy lifting of helping you to find the best and worst items for appropriate attention.

Tip

When displaying content reputation, avoid putting too many different scores of different types on a page. For example, on the Yahoo! TV episode page, a user can give an overall star rating to a TV program and a thumb vote on an individual episode of the program. Examination of the data showed that many visitors to the page clicked the thumb icons when they meant to rate the entire show, not just an episode.

Karma

Content reputation is about things—typically inanimate objects without emotions or the ability to directly respond in any way to its reputation.

But karma represents the reputation of users, and users are people. They are alive, they have feelings, and they are the engine that powers your site. Karma is significantly more personal and therefore sensitive and meaningful. If a manufacturer gets a single bad product review on a website, it probably won’t even notice. But if a user gets a bad rating from a friend—or feels slighted or alienated by the way your karma system works—she might abandon an identity that has become valuable to your business. Worse yet, she might abandon your site altogether and take her content with her. (Worst of all, she might take others with her.)

Take extreme care in creating a karma system. User reputation on the Web has undergone many experiments, and the primary lesson from that research is that karma should be a complex reputation and it should be displayed rarely.

Karma is complex, built of indirect inputs

Sometimes making things as simple and explicit as possible is the wrong choice for reputation:

Rating a user directly should be avoided. Typical implementations require a user to click only once to rate another user and are therefore prone to abuse. When direct evaluation karma models are combined with the common practice of streamlining user registration processes (on many sites opening a new account is an easier operation than changing the password on an existing account), they get out of hand quickly. See the example of Orkut in Numbered levels.
Asking people to evaluate others directly is socially awkward. Don’t put users in the position of lying about their friends.
Using multiple inputs presents a broader picture of the target user’s value.
Economics research into “revealed preference,” or what people actually do, as opposed to what they say, indicates that actions provide a more accurate picture of value than elicited ratings.

Karma calculations are often opaque

Karma calculations may be opaque because the score is valuable as status, has revenue potential, and/or unlocks privileged application features.

Display karma sparingly

There are several important things to consider when displaying karma to the public:

Publicly displayed karma should be rare because, as with content reputation, users are easily confused by the display of many reputations on the same page or within the same context.
Publicly displayed karma should be rare because it can create the wrong incentives for your community. Avoid sorting users by karma. See Leaderboards Considered Harmful.
If you do display it publicly, make karma visually distinct from any nearby content reputation. Yahoo!’s EU message board displays the karma of a post’s author as a colored medallion, with the message rated with stars. But consider this: Slashdot’s message board doesn’t display the karma of post authors to anyone. Even the display of a user’s own karma is vague: “positive,” “good,” or “excellent.” After originally displaying karma publicly as a number, over time Slashdot has shifted to an increasingly opaque display.
Publicly displayed karma should be rare because it isn’t expected. When Yahoo! Shopping added Top Reviewer karma to encourage review creation, it displayed a Top Reviewer badge with each review and rushed it out for the Christmas 2006 season. After the New Year had passed, user testing revealed that most users didn’t even notice the badges. When they did notice them, many thought they meant either that the item was top rated or that the user was a paid shill for the product manufacturer or Yahoo!.

Karma caveats

Though karma should be complex, it should still be limited to as narrow a context as possible. Don’t mix shopping review karma with chess rank. It may sound silly now, but you’d be surprised how many people think they can make a business out of creating an Internet-wide trustworthiness karma.

Yahoo! holds reputation for karma scores to a higher standard than reputation for content. Be very careful in applying terminology and labels to people, for a couple of reasons:

Avoid labels that might appear as attacks. They set a hostile tone that will be amplified in users’ responses. This caution applies both to overly positive labels (such as “hotshot” or “top” designations) or negative ones (such as “newbie” or “rookie”).
Avoid labels that introduce legal risks. What if a site labeled members of a health forum “experts,” and these “experts” then gave out bad advice?

These are rules of thumb that may not necessarily apply to a given context. In role-playing games, for example, publicly shared simple karma is displayed in terms of experience levels, which are inherently competitive.

Reputation Display Formats

Reputation data can be displayed in numerous formats. By now, you’ve actually already done much of the work of selecting appropriate formats for your reputation data, so we’ll simply describe pros and cons of a handful of them—the formats in most common use on the Web.

The formats you select will depend heavily on the types of inputs that you decided on Chapter 6. If, for instance, you’ve opted to let users make explicit judgments about a content item with 5-star ratings, it’s probably appropriate to display those ratings to the community in a similar format.

However, that consistency won’t work when the reputation you want to display is an aggregation or transformation of scores derived from very different input methods. For instance, Yahoo! Movies provides a critic’s score as a letter grade compiled from scores from many professional critics, each of whom uses a different scale (some use 4- or 5-star ratings, some thumb votes, and still others use customized iconic scores). Such scores are all transformed into normalized scores, which can then be displayed in any form.

Here are the four primary data classes for reputation claims:

Normalized score

Most composite reputations are represented as decimal numbers from 0.0 to 1.0, with all inputs converted, or normalized, to this range. (See Chapter 6 for more on the specific normalization functions.) Displaying a reputation in the various forms presented in the remainder of this chapter is also known as denormalization: the process of converting reputation data into a presentable format.

Summary count, raw score, and other transitional values

Sometimes a reputation must hold other numeric values to better represent the meaning of the normalized score when it is displayed. For example, in a simple-mean reputation, the summary count of the inputs that contribute to the reputation are also tracked, allowing a display patterns that can override or modify the score. For example, a pattern could require a minimum number of inputs (see Liquidity: You Won’t Get Enough Input).

In cases where information may be lost during the normalization process, the original input value, or raw score, should also be stored. Finally, other related or transitional values may also be available for display, depending on the reputation statement type. For example, the simple average claim type keeps the rolling sum of the previous ratings along with a counter as transitional values in order to rapidly recompute the average when new ratings arrives.

Freeform content

Freeform inputs provided by users may be constrained along certain dimensions, such as format or length, but they are otherwise completely up to the users’ discretion. Some examples of this class of data are user comments and video responses. Notice that items like the title of a product review (if the review writer is given the option to provide one) is also a freeform element; it gives review writers an opportunity to provide an opinion about a target. Content tags are also a type of freeform content element.

Freeform content is a notable class of data because, although deriving computable values from them is more difficult, users themselves can derive a lot of qualitative benefit from it.

Tip

At Yahoo! study after study has shown that when users read reviews by other community members—whether the reviews cover movies, albums, or other products—it’s the body of the review that users pay the most attention to. The stars and the number of favorable votes matter, but people trust others’ words first and foremost. They want to trust an opinion based on shared affinity with the writer, or how well they express themselves. Only then will they give attention to the other stuff.

Metadata

Sometimes, machine-understood information about an object can yield insight into its overall quality or standing within a community. For comparative purposes, for example, you might want to know which of two different videos was available first on your site. Examples of metadata relevant to reputation include the following:

Timestamp
Geographical coordinates
Format information, such as the length of audio, video, or other media files
The number of links to an item or the number of times the item itself has been embedded in another site

Reputation Display Patterns

Once you’ve decided to display reputation, your decision does not end there. There are a number of possible display patterns for showing reputation (and they may even be used in combination). Some of the more common patterns are discussed in the upcoming sections.

Normalized Score to Percentage

A normalized score ranges from 0.0 to 1.0 and represents a reputation that can be compared to other reputations no matter what forms were used for input. When displaying normalized scores to users, convert them to percentages (multiply by 100.0), the numeric form most widely understood around the world. From here on, we assume this transformation when we discuss display of a percentage or normalized score to users.

The percentage may be displayed as a whole number or with fixed decimal places, depending on the statistical significance of your reputation and user interface and layout considerations. Remember to include the percent symbol (%) to avoid confusion with the display of either points or numbered levels.

Things to consider before displaying percentages:

Use this format when the normalized reputation score is reasonably precise and accurate. For example, if hundreds or thousands of votes have been cast in an election, displaying the exact average percentage of affirmative and negative votes is easier to understand than just the total of votes cast for and against.
Be careful how you display percentages if the input claim type isn’t suitable for normalized output of the aggregated results. For example, consider displaying the results of a series of thumb votes; though you can display the thumb graphic that got the majority of votes, you’ll probably still want to display either the raw votes for each or the percentages of the total up votes and down votes.
Figure 7-4 displays content reputation as the percentage of thumbs-up ratings given on Yahoo! Television for a television episode. Notice that the simple average calculation requires that the total number of votes be included in the display to allow users to evaluate the reliability of the score.

Consider that a graphical sliding scale or thermometer view will make the reputation easier to understand at a glance. If necessary, also display the numeric value alongside the graphic.

Figure 7-5 shows a number of Okefarflung’s karma scores as percentage bars, each representing his reputation with various political factions on World of Warcraft. Printed over each bar is one of the current named levels (see the next section Named levels) in which his current reputation falls.

Pros	Cons
Percentage displays of normalized scores are universally understood. Is Web 2.0 API- and spreadsheet-friendly. Implementation is trivial. This is often the primary reason this approach is considered.	Percentages aren’t accurate for very small sample sizes and therefore can be misleading. One yes vote shouldn’t be expressed as “100.00% of votes tallied are in favor....” Consider suppressing percentage display until a reasonable number of inputs have accumulated, adjusting the score, or at least displaying the number of inputs alongside the average. As with accuracy, precision entails various challenges: displaying too many decimal digits can lead users to make unwarranted assumptions about accuracy. Also, if the input was from level-based or nonlinear normalization or irregular distributions, average scores can be skewed. Lots of numbers on a page can seem impersonal, especially when they’re associated with people.

Pros

Cons

Percentage displays of normalized scores are universally understood.
Is Web 2.0 API- and spreadsheet-friendly.
Implementation is trivial. This is often the primary reason this approach is considered.

Percentages aren’t accurate for very small sample sizes and therefore can be misleading. One yes vote shouldn’t be expressed as “100.00% of votes tallied are in favor....” Consider suppressing percentage display until a reasonable number of inputs have accumulated, adjusting the score, or at least displaying the number of inputs alongside the average.
As with accuracy, precision entails various challenges: displaying too many decimal digits can lead users to make unwarranted assumptions about accuracy. Also, if the input was from level-based or nonlinear normalization or irregular distributions, average scores can be skewed.
Lots of numbers on a page can seem impersonal, especially when they’re associated with people.

Figure 7-4. Content example: normalized percentages with summary count.

Figure 7-5. Karma example: percentage bars with named levels.

Points and Accumulators

Points are a specific example of an accumulator reputation display pattern: the score simply increases or decreases in value over time, either monotonically (one at a time) or by arbitrary amounts. Accumulator values are almost always displayed as digits, usually alongside a units designation, for example, 10,000XP or Posts: 1,429. The aggregation of the Vote-to-Promote input pattern is an accumulator.

If an accumulator has a maximum value that is understood by the reputation system, an alternative is to display it using any of the display patterns for normalized scores, such as percentages and levels.

Using points and accumulators:

Display counts of actions collected from many users, such as voting and favorites.
Figure 7-6 shows an entry from Digg.com, which displays two different accumulators: the number of Diggs and Comments. Note the Share and Bury buttons. Though these affect the chance that an entity is displayed on the home page, the counts for these actions are not displayed to the users.
Publicly display points when you wish to encourage users to take actions that increase or decrease the value for an entity.
Figure 7-7 shows a typical participation-points-enabled website, in this case Yahoo! Answers. Points are granted for a very wide range of activities, including logging in, creating content, and evaluating other’s contributions. Note that this miniprofile also displays a numbered level (see Numbered levels) to simplify comparison between users. The number of points accumulated in such systems can get pretty large.

Alternatively, consider keeping a point value of personal and presenting any public display as either a numbered or a named level.

Pros	Cons
Explicitly displayed point amounts that the user can influence can be a powerful motivator for some users to participate. Is easy to understand in ranked lists. Implementation is trivial.	First-mover effect. If your accumulator has no cap, awards effectively deflate over time as the leading entities continue to accumulate points and increase their lead. New users become frustrated that they can’t catch up, and new—often more interesting—entities receive less attention. Consider either caps and/or decay for your point system. Encourages the minimum effort for the maximum benefit behavior. The system tells you exactly how many points are associated with your actions in real time. Yahoo! Answers gives 10 points for an answer chosen as the best, and 1 point each to users who rate other people’s answers. Too bad that writing the best answer takes more than 10 times as long as it does to click a thumb icon 10 times. If you do cap your points, when the most of your users reach that cap, you will need to add new activities to justify moving the cap to move higher. For example, online role-playing games typically extend the level-cap along with expanded content for the users to explore.

Pros

Cons

Explicitly displayed point amounts that the user can influence can be a powerful motivator for some users to participate.
Is easy to understand in ranked lists.
Implementation is trivial.

First-mover effect. If your accumulator has no cap, awards effectively deflate over time as the leading entities continue to accumulate points and increase their lead. New users become frustrated that they can’t catch up, and new—often more interesting—entities receive less attention. Consider either caps and/or decay for your point system.
Encourages the minimum effort for the maximum benefit behavior. The system tells you exactly how many points are associated with your actions in real time. Yahoo! Answers gives 10 points for an answer chosen as the best, and 1 point each to users who rate other people’s answers. Too bad that writing the best answer takes more than 10 times as long as it does to click a thumb icon 10 times.
If you do cap your points, when the most of your users reach that cap, you will need to add new activities to justify moving the cap to move higher. For example, online role-playing games typically extend the level-cap along with expanded content for the users to explore.

Figure 7-6. Content example: Digg shows the number of times an item has been “Dugg.” Another example is the count of comments for an item.

Figure 7-7. Karma example: Yahoo! Answers awards points mostly for participation.

Statistical Evidence

One very useful strategy for reputation display is to use statistical evidence: simply include as many of the inputs in a content item’s reputation as possible, without attempting to aggregate them in visible scores. Statistical evidence lets users zero in on the aspects of a content item that they consider the most telling. The evidence might consist of a series of simple accumulator scores:

Number of views
Number of links
Number of comments
Number of times marked as a favorite or voted on

Using statistical evidence:

Use this display format when a variety of data points would provide a well-rounded view of an entity’s worth or performance.
Figure 7-8 shows YouTube.com’s many different statistics associated with each video, each subject to different subjective interpretation. For example, the number of times a video is Favorited can be compared to the total number of Views to determine relative popularity.
Use statistical evidence in displays of counts of actions collected from many users, such as voting and favorites.
Yahoo! Answers provides a categorical breakdown of statistics by contributor, as shown in Figure 7-9. This allows readers to notice whether the user is an answer-person (as shown here) or a question-person or something else.

Optionally, you might extend statistical evidence to include even more information about how a particular score was derived.

Figure 7-10 shows how Yahoo! Answers displays not only how many people have “starred” a question (that is, found it interesting), it also shows exactly who starred it. However, displaying that information can have negative consequences: among other things, it may create an expectation of social reciprocity (for example, your friends might become upset if you opted not to endorse their contributions).

Pros	Cons
Does not attempt to mediate or frame the experience for users. Lets them decide which reputation elements are relevant for their purposes.	Can tend to overwhelm an interface, with a dozen factoids and statistics about every piece of content. Giving too much prominence or weight to statistical evidence in a reputation display may overemphasize the information’s importance—for example, Twitter’s follower-counts encourage the hording of meaningless connections. (See Leaderboards Considered Harmful.)

Content Example: with YouTube’s very powerful Statistics and Data you can track a video’s rise in popularity on the site. (Sociologist and researcher Cameron Marlow calls it an Epidemiology Interface.)

Figure 7-8. Content Example: with YouTube’s very powerful “Statistics and Data” you can track a video’s rise in popularity on the site. (Sociologist and researcher Cameron Marlow calls it an “Epidemiology Interface.”)

Figure 7-9. Karma example: answers enhanced point and level information with statistical detail.

Figure 7-10. Yahoo! Answers displays the sources for statistical evidence.

Levels

Levels are reputation display patterns that remove insignificant precision from the score. Each level is a bucket holding all the scores in a range. Levels allow you to round off the results and simplify the display. Notice that the range of scores in each level need not be evenly distributed, as long as the users understand the relative difficulty of reaching each level.

Common display patterns for levels include numbered levels and named levels.

When using levels:

Use levels when the reputation is an average and inputs are limited to a small, fixed set, such as 5 stars.
Levels are helpful when the reputation is an average and may be calculated from a very small number of inputs. Levels will hide irrelevant precision.
Most applications use levels when reputation accumulates at a nonlinear rate. For example, in many role-playing games, each experience level requires twice as many experience points as the previous level.
Use levels if some features of your application are unlocked depending on the reputation score; users will want to know that they’ve achieved the required threshold.
Be careful using levels when the input was gathered using a different scale. If the user clicks a thumb icon, displaying the resulting score as 5 stars will be confusing.
Be careful when listing entities by level not to surface relative position within a level. Doing so can encourage undesired competition for specific page positions. Sort by the lower precision level value, not the high precision normalized value.

Numbered levels

Numbered levels are the most basic form of level display. This display pattern consists of a simple numeric value or a list of repeated icons representing the level that the reputation score falls into. Usually levels are 0 or 1 to n, though arbitrary ranges are possible as long as they make sense to users. The score may be an integer or a rounded fraction, such as 3½ stars. If the representation is unfamiliar to users, consider adding an element to the interface to explain the score and how it was calculated. Such an element is mandatory for reputations with nonlinear advancement rates.

Using numbered levels:

Assign numbered levels if the reputation will be displayed in a rank-ordered sort a list of entities.
Figure 7-11 shows a typical Stars-and-Bars display pattern for ratings and reviews. Stars and Bars are numbered levels, which happen to be displayed as graphics. In this example, each has a numbered level of 0 to 5. Though each review’s ratings are useful when displayed alongside the entity, the average of the overall score is used to rank-order results on search results pages.
It is typical to use numbered levels to display aggregate reputation if the inputs were also numbered levels. Did you input stars? Then output stars.
Figure 7-12 shows the karma ratings from Orkut.com. The Fans indicator is an accumulator (see Points and Accumulators), and the Trusty, Cool, and Sexy ratings are numeric levels. The users simply click on the smiling faces, ice cubes, and hearts next to their friends’ profiles to influence their scores. Many sites don’t allow direct karma ratings such as these with good reason (see Karma).

If you need to display more than 10 levels, use numbered levels. Consider using numbered levels instead of named levels if you display more than five levels.

Figure 7-13 displays two forms, out of many, of numbered levels for the game World of Warcraft. The user controls a character whose name is shown in the Members column. The first numbered level is labeled “Level” and ranges from 1 to 80, representing the amount of time and skill the user has dedicated to this character. The Guild Rank is a reverse-rank numbered level that represents the status of the user in the guild. This score is assigned by the guild master, who has the lowest guild rank.

Pros	Cons
Is easy to read. Accommodates unlimited values. You can always add more levels at the top. In ranked lists, relative value is easy to see.	Numeric format doesn’t convey limits or global value. Is level 20 good? What about 40? Often requires “What’s this?” user interface elements to explain levels to new users. Lots of numbers on a page can seem impersonal, especially when they’re associated with people. For karma, numbered levels can be perceived as fostering an undesirable competitive spirit.

Figure 7-11. Content example: stars and bars (iconic numbered levels).

Figure 7-12. Karma example: Orkut profile with an accumulator and iconic number levels.

Figure 7-13. Karma example: Experience levels and guild rank (sortable).

Named levels

In a named levels display pattern, a short, readable string of characters is substituted for a level number.

The name adds semantic meaning to each level so that users can more easily recognize the entity’s reputation when the reputation is displayed separately. Is the user a “silver contributor” or is the beef prime, choice, select, or standard?

Using named levels:

Named levels are useful when the number of labels is five or less, so that each level can have a name that accurately expresses its meaning.

Table 7-1 and Figure 7-14 show the meat grading levels used by the United States Department of Agriculture. The labels are descriptive, representing existing industry terms, and several are shared across different animal species—providing consumers a consistent standard for comparison.

Table 7-1. Content example: USDA meat grades

Species	Quality grades
Beef	Prime, choice, select, standard, utility, cutter, canner
Lamb and yearling mutton	Prime, choice, good, utility, cull
Mutton	Choice, good, utility, cull
Veal and calf	Prime, choice, good, standard, utility

Figure 7-14. Content example: USDA prime, choice, and select stamps.

Named levels are particularly useful when numeric levels are too impersonal or encourage undesired competition.

If you’re considering using numeric levels but find that the top and bottom levels should feel closer together than the numeric distance between them would otherwise indicate—consider using named levels instead. This is especially useful with karma scores so that new participants don’t get stuck with a demeaning level indicator, like “Level 1 of 10.”

Figure 7-15 displays the current named levels used by WikiAnswers.com for user contributions. The original three categories were Bronze, Silver, and Gold—named after competitive medals. They are granted when nonlinearly increasing thresholds are met. Over time, the system has been expanded on three separate occasions to reward the nearly compulsive contributions of a handful of users.

Pros	Cons
Hiding level numbers allows for more expressiveness. Level names can be thematically appropriate to, and vary by, your application(s). Common hierarchies work well—for example, poor, average, good, and excellent. This pattern is usually stronger when the named levels are displayed alongside other ratings, such as stars, points, and raw scores, to clarify them.	Care must be taken when setting up the level names if you ever expect to add more to either end of the scale. Something else for your user to learn. Cultural bias can be a problem, especially if your site has an international audience. For example, the letter grading system of F, D, C, B, A is not internationally understood. Ambiguous names are more confusing than simple level numbers. Is the Ruby level better than Gold?

Pros

Cons

Hiding level numbers allows for more expressiveness.
Level names can be thematically appropriate to, and vary by, your application(s).
Common hierarchies work well—for example, poor, average, good, and excellent.
This pattern is usually stronger when the named levels are displayed alongside other ratings, such as stars, points, and raw scores, to clarify them.

Care must be taken when setting up the level names if you ever expect to add more to either end of the scale.
Something else for your user to learn.
Cultural bias can be a problem, especially if your site has an international audience. For example, the letter grading system of F, D, C, B, A is not internationally understood.
Ambiguous names are more confusing than simple level numbers. Is the Ruby level better than Gold?

Figure 7-15. Karma example: The contributor levels on WikiAnswers have seen several awkward expansions.

Ranked Lists

A ranked list is based on highest or lowest reputation scores. Ranking systems are by their very nature comparative, and—human nature being what it is—the online community is likely to perceive this design choice as an encouragement of competition between users.

Leaderboard ranking

A leaderboard is a rank-ordered listing of reputable entities within your community or content pool. Leaderboards may be displayed in a grid, with rows representing the entities and columns describing those entities across one or more characteristics (name, number of views, and so on). Leaderboards provide an easy and approachable way to display the best performers in your community.

Use leaderboards for content liberally. Provide filtered views of the boards to slice and dice by time (“Popular Today/This Week/All Time”) or by reputation type (“Most Viewed/Top Rated”).
Figure 7-16 shows YouTube’s leaderboard ranking for most viewed videos as a grid. With numbers this high, it’s hard for potential reputation abusers to push inappropriate content onto the first page. Note that there are several leaderboards, one each for Today, This Week, This Month, and All Time.

Use leaderboards for people sparingly, and only in contexts that are competitive by nature. Consider giving people leaderboards narrow scope (for example, only ranking me against my friends, to keep the comparisons fun and the stakes low).

Figure 7-17 displays Yahoo! Answer’s leaderboard. The original version of this page was based solely on the number of points accumulated by participation, and users quickly figured out which actions produced the most points for the least effort. When the user’s best-answer percentage was eventually added to the profile display, it was discovered that the top-ranked users all had quality scores of less than 10%!

Pros	Cons
Clear and browsable way to compare items for specific qualities Data-intensive display: leaderboards satiate demand from information junkie users	May incite unhealthy competition to reach (or stay at) the top of the leaderboard. When used with accumulators, leaderboards can get stale as a few popular items move to the top and get stuck there, since nothing makes something more popular than its appearance on the list of most popular things.

Figure 7-16. Content example: YouTube’s most viewed videos.

Figure 7-17. Karma example: Yahoo! Answers leaderboard.

Top-X ranking

This is a specialized type of leaderboard where top-ranking entities are grouped into numerical categories of performance. Achieving top-10 status (or even top-100) should be a rare and celebrated feat.

When using Top-X ranking:

Use top-X leaderboards for content to highlight only the best of the best contributions in your community.
Figure 7-18 shows a Top-X display for content: Billboard’s Hot 100’s list of top recordings. The artists themselves have very little, if any, direct influence over their song’s rank on this list.

Use top-X designations for people sparingly, and only in contexts that are competitive by nature. Because available categories in a top-X system are bounded, they will have greater perceived value in the community.

Figure 7-19 displays the new index of Top-X karma for Amazon.com review writers. The very high number of reviews written by each of these leaders creates value both for Amazon and the reviewers themselves. Authors and publishers seek them out to review/endorse their book—sometimes for a nominal fee. The original version of this reputation system, now known as “Classic Reviewer Rank,” suffered deeply from first-mover effects (see First-mover effects) and other problems detailed in this book. This eventually lead to the creation of the new model, as pictured.

Pros	Cons
Highly motivating for top performers. The prestige of earning a top-10 or top-100 designation may make contributors work twice as hard to keep it. Yields a small, bounded set of entities to promote as high quality.	May incite unhealthy competition to reach (or stay at) the top of the ranks. For top-X karma based on accumulators, if a user’s reputation falls just below a category dividing line and the user knows his score, these categories often lead to minimum/maximum gaming, in which the user engages in a flurry of low-quality activity just to advance his top-X category. Top-X karma badges are unfamiliar to users who don’t contribute content. Don’t expect passive users to understand or even notice a top-X badge displayed alongside content reputation. Top-X badges are for content producers, not consumers.

Pros

Cons

Highly motivating for top performers. The prestige of earning a top-10 or top-100 designation may make contributors work twice as hard to keep it.
Yields a small, bounded set of entities to promote as high quality.

May incite unhealthy competition to reach (or stay at) the top of the ranks.
For top-X karma based on accumulators, if a user’s reputation falls just below a category dividing line and the user knows his score, these categories often lead to minimum/maximum gaming, in which the user engages in a flurry of low-quality activity just to advance his top-X category.
Top-X karma badges are unfamiliar to users who don’t contribute content. Don’t expect passive users to understand or even notice a top-X badge displayed alongside content reputation. Top-X badges are for content producers, not consumers.

Figure 7-18. Content example: Billboard’s Hot 100.

Figure 7-19. Karma example: Amazon’s top reviewer rankings.

Practitioner’s Tips

Leaderboards Considered Harmful

It’s still too early to speak in absolutes about the design of social-media sites, but one fact is becoming abundantly clear: ranking the members of your community—and pitting them against one another in a competitive fashion—is typically a bad idea. Like the fabled djinni of yore, leaderboards on your site promise riches (comparisons! incentives! user engagement!!) but often lead to undesired consequences.

The thought process involved in creating leaderboards typically goes something like this: there’s an activity on your site that you’d like to promote; a number of people are engaged in that activity who should be recognized; and a whole bunch of other people won’t jump in without a kick in the pants. Leaderboards seem like the perfect solution. Active contributors will get their recognition: placement at the top of the ranks. The also-rans will find incentive: to emulate leaders and climb the boards.

And that activity you’re trying to promote? Site usage should swell with all those earnest, motivated users plugging away, right? It’s the classic win-win-win scenario. In practice, employing this pattern has rarely been this straightforward. Here are just a few reasons why leaderboards are hard to get right.

What do you measure?

Many leaderboards make the mistake of basing standings only on what is easy to measure. Unfortunately, what’s easy to measure often tells you nothing at all about what is good. Leaderboards tend to fare well in very competitive contexts, because there’s a convenient correlation between measurability and quality. (It’s called “performance”—number of wins versus losses within overall attempts.)

But how do you measure quality in a user-generated video community? Or a site for ratings and reviews? It should have very little to do with the quantities of simple activity that a person generates (the number of times an action is repeated, a comment given or a review posted). But such measurements—discrete, countable, and objective—are exactly what leaderboards excel at.

Whatever you do measure will be taken way too seriously

Even if you succeed in leavening your leaderboard with metrics for quality (perhaps you weigh community votes or count send-to-a-friend actions), be aware that—because a leaderboard singles out these factors for praise and reward—your community will hold them in high esteem, too. Leaderboards have a kind of “Code of Hammurabi” effect on community values: what’s written becomes the law of the land. You’ll likely notice that effect in the activities that people will—and won’t—engage in on your site. So tread carefully. Are you really that much smarter than your community, that you alone should dictate its character?

If it looks like a leaderboard and quacks like a leaderboard…

Even sites that don’t display overt leaderboards may veer too closely into the realm of comparative statistics. Consider Twitter and its prominent display of community members’ stats.

The problem may not lie with the existence of the stats but in the prominence of their display (see Figure 7-20). They give Twitter the appearance of a community that values popularity and the sheer size of a participant’s social network. Is it any wonder, then, that a whole host of community-created leaderboards have sprung up to automate just such comparisons? Twitterholic, Twitterank, Favrd, and a whole host of others are the natural extension of this value-by-numbers approach.

Figure 7-20. You’d be completely forgiven if you signed into Twitter and mistook this dashboard for a scoreboard!

Leaderboards are powerful and capricious

In the earliest days of Orkut (Google’s also-ran entry in social networking), the product managers put a fun little widget at the top of the site: a country counter, showing where members were from. Cute and harmless, right? Google had no way of knowing, however, that seemingly the entire population of Brazil would make it a point of national pride to push their country to the top of that list. Brazilian blogger Naitze Teng wrote:

Communities dedicated to raising the number of Brazilians on Orkut were following the numbers closely, planning gatherings and flash mobs to coincide with the inevitable. When it was reported that Brazilians had outnumbered Americans registered on Orkut, parties…were thrown in celebration.

Brazil has maintained its number one position on Orkut (as of this writing, 51% of Orkut users are Brazilian; the United States and India are tied for a distant second with 17% apiece). Orkut today is basically a Brazilian social network. That’s not a bad “problem” for Google to have, but it’s probably not an outcome that it would have expected from such a simple, small, and insignificant thing as a leaderboard widget.

Who benefits?

The most insidious artifact of a leaderboard community may be that the very presence of a leaderboard changes the community dynamic and calls into question the motivations for every action that users take. If that sounds a bit extreme, consider Twitter: friend counts and followers have become the coins of that realm. When you get a notification of a new follower, aren’t you just a little more likely to believe that it’s just someone fishing around for a reciprocal follow? Sad, but true. And this despite the fact that Twitter itself never has officially featured a leaderboard; it merely made the statistics known and provided an API to get at them. In doing so, it may have let the genie out of the bottle.

Note

“Leaderboards Considered Harmful” first appeared as an essay in Designing Social Interfaces (O’Reilly) by Christian Crumlish and Erin Malone, also available online at DesigningSocialInterfaces.com.

Going Beyond Displaying Reputation

This entire chapter has focused on the explicit display of reputation, usually directly to users. Though important, this isn’t typically the most valuable use for this information. Chapter 8 describes using reputation to modify the utility of an application—to separate the best entities from the pack, and to help identify and destroy the most harmful ones.

Table of Contents for 7. Displaying Reputation

Create new playlist

Sign In

Sign Up

Chapter 7. Displaying Reputation

How to Use a Reputation: Three Questions

Caution

Who Will See a Reputation?

To Show or Not to Show?

Caution

Personal Reputations: For the Owner’s Eyes Only

Tip

Personal and Public Reputations Combined

Public Reputations: Widely Visible

Tip

Corporate Reputations Are Internal Use Only: Keep Them Hush-hush

Tip

How Will You Use Reputation to Modify Your Site’s Output?

Reputation Filtering

Tip

Reputation Ranking and Sorting

Caution

Reputation Decisions

Content Reputation Is Very Different from Karma

Content Reputation

Tip

Karma

Karma is complex, built of indirect inputs

Karma calculations are often opaque

Display karma sparingly

Karma caveats

Reputation Display Formats

Tip

Reputation Display Patterns

Normalized Score to Percentage

Points and Accumulators

Statistical Evidence

Levels

Numbered levels

Named levels

Ranked Lists

Leaderboard ranking

Top-X ranking

Practitioner’s Tips

Leaderboards Considered Harmful

What do you measure?

Whatever you do measure will be taken way too seriously

If it looks like a leaderboard and quacks like a leaderboard…

Leaderboards are powerful and capricious

Who benefits?

Note

Going Beyond Displaying Reputation

Table of Contents for
7. Displaying Reputation