Chapter 7

Usability (User) Testing

An important category of system test cases is one that attempts to find human-factor, or usability, problems. When the first edition of this book was published, the computing industry mostly ignored the human factors associated with computer software. Developers gave little attention to how humans interacted with their software. That is not to say that there were no developers testing applications at the user level. In the early 1980s, some—including developers at the Xerox Palo Alto Research Center (PARC), for example—were conducting user-based software testing.

By 1987 or 1988, the three of us had become intimately involved in usability testing of early personal computer hardware and software, when we contracted with computer manufacturers to test and review their new desktop computers prior to release to the public. Over perhaps two years, this prerelease testing prevented potential usability problems with new hardware and software designs. These early computer manufacturers obviously were convinced that the time and expense required for this level of user testing resulted in real marketing and financial advantages.

Usability Testing Basics

Today's software systems—particularly those designed for a mass, commercial market—generally have undergone extensive human-factor studies, and modern programs, of course, benefit from the thousands of programs and systems that have gone before. Nevertheless, an analysis of human factors is still a highly subjective matter. Here's our list of questions you might ask to derive testing considerations:

1. Has each user interface been tailored to the intelligence, educational background, and environmental pressures of the end user?

2. Are the outputs of the program meaningful, noninsulting to the user, and devoid of computer gibberish?

3. Are the error diagnostics, such as error messages, straightforward, or does the user need a PhD in computer science to comprehend them? For instance, does the program produce such messages as IEK022A OPEN ERROR ON FILE 'SYSIN' ABEND CODE=102? Messages such as these weren't all that uncommon in software systems of the 1970s and 1980s. Mass-market systems do better today in this regard, but users still will encounter unhelpful messages such as, “An unknown error has occurred,” or “This program has encountered an error and must be restarted.”

Programs you design yourself are under your control and should not be plagued with such useless messages. Even if you didn't design the program, if you are on the testing team, you can push for improvements in this area of the human interface.

4. Does the total set of user interfaces exhibit considerable conceptual integrity, an underlying consistency, and uniformity of syntax, conventions, semantics, format, style, and abbreviations?

5. Where accuracy is vital, such as in an online banking system, is sufficient redundancy present in the input? For example, such a system should ask for an account number, a customer name, and a personal identification number (PIN) to verify that the proper person is accessing account information.

6. Does the system contain an excessive number of options, or options that are unlikely to be used? One trend in modern software is to present to users only those menu choices they are most likely to use, based on software testing and design considerations. Then a well-designed program can learn from individual users and begin to present those menu items that they frequently access. Even with such an intelligent menu system, successful programs still must be designed so that accessing the various options is logical and intuitive.

7. Does the system return some type of immediate acknowledgment to all inputs? Where a mouse click is the input, for example, the chosen item can change color, or a button object can depress or be presented in a raised format. If the user is expected to choose from a list, the selected number should be presented on the screen when the choice is made. Moreover, if the selected action requires some processing time—which is frequently the case when the software is accessing a remote system—then a message should be displayed informing the user of what is going on. This level of testing sometimes is referred to as component testing, whereby interactive software components are tested for reasonable selection and user feedback.

8. Is the program easy to use? For example, is the input case-sensitive without making this fact clear to the user? Also, if a program requires navigation through a series of menus or options, is it clear how to return to the main menu? Can the user easily move up or down one level?

9. Is the design conducive to user accuracy? One test would be an analysis of how many errors each user makes during data entry or when choosing program options. Were these errors merely an inconvenience—errors the user was able to correct—or did an incorrect choice or action cause some kind of application failure?

10. Are the user actions easily repeated in later sessions? In other words, is the software design conducive to the user learning how to be more efficient in using the system?

11. Did the user feel confident while navigating the various paths or menu choices? A subjective evaluation might be the user response to using the application. At the end of the session did the user feel stressed by or satisfied with the outcome? Would the user be likely to choose this system for his or her own use, or recommend it to someone else?

12. Did the software live up to its design promise? Finally, usability testing should include an evaluation of the software specifications versus the actual operation. From the user perspective—real people using the software in a real-world environment—did the software perform according to its specifications?

Usability or user-based testing basically is a black-box testing technique. Recall from our discussion in Chapter 2 that black-box testing concentrates on finding situations in which the program does not behave according to specifications. In a black-box scenario you are not concerned with the internal workings of the software, or even with understanding program structure. Presented this way, usability testing obviously is an important part of any development process. If users perceive, because of improper design, a cumbersome user interface, or specifications missed or ignored, that a given application does not perform according to its specifications, the development process has failed. User testing should uncover problems from design flaws to software ergonomics mistakes.

Usability Testing Process

It should be obvious from our list of items to test that usability testing is more than simply seeking user opinions or high-level reactions to a software application. When the errors have been found and corrected, and an application is ready for release or for sale, focus groups can be used to elicit opinions from users or potential purchasers. This is marketing and focusing. Usability testing occurs earlier in the process and is much more involved.

Any usability test should begin with a plan. (Review our vital software testing guidelines in Chapter 2, Table 2.1.) You should establish practical, real-world, repeatable exercises for each user to conduct. Design these testing scenarios to present the user with all aspects of the software, perhaps in various or random order. For example, among the processes you might test in a customer tracking application are:

  • Locate an individual customer record and modify it.
  • Locate a company record and modify it.
  • Create a new company record.
  • Delete a company record.
  • Generate a list of all companies of a certain type.
  • Print this list.
  • Export a selected list of contacts to a text file or spreadsheet format.
  • Import a text file or spreadsheet file of contacts from another application.
  • Add a photograph to one or more records.
  • Create and save a custom report.
  • Customize the menu structure.

During each phase of the test, have observers document the user experience as they perform each task. When the test is complete, conduct an interview with the user or provide a written questionnaire to document other aspects of the user's experience, such as his or her perception of usage versus specification.

In addition, write down detailed instructions for user tests, to ensure that each user starts with the same information, presented in the same way. Otherwise, you risk coloring some of the tests if some users receive different instructions.

Test User Selection

A complete usability testing protocol usually involves multiple tests from the same users, as well as tests from multiple users. Why multiple tests from the same users? One area we want to test is user recall, that is, how much of what a user learns about software operation is retained from session to session. Any new system presented to users for the first time will require some time to learn, but if the design for a particular application is consistent with the industry or technology with which the target user community is familiar, the learning process should be fairly quick.

A user already familiar with computer-based engineering design, for example, would expect any new software in this same industry to follow certain conventions of terminology, menu design, and perhaps even color, shading, and font usage. Certainly, a developer may stray from these conventions purposefully to achieve perceived operational improvements, but if the design goes too far afield from industry standards and expectations, the software will take longer for new users to learn; in fact, user acceptance may be so slow as to cause the application to be a commercial failure. If the application is developed for a single client, such differences may result in the client rejecting the design or requiring a complete user interface redesign. Either result is a costly developer mistake.

Therefore, software targeted for a specific end-user type or industry should be tested by what could be described as expert users, people already familiar with this class of application in a real-world environment. In contrast, software with a more general target market—mobile device software, for example, or general-purpose Web pages—might better be tested by users selected randomly. (Such test user selection sometimes is referred to as hallway testing or hallway intercept testing, meaning that users chosen for software testing are selected at random from folk passing by in the hallway.)

How Many Users Do You Need?

When designing a usability test plan, the question “How many testers do I need?” will come to the forefront. Hiring usability testers is often overlooked in the development process, and can add an unexpected and expensive cost to the project. You need to find the right number of testers who can identify the most errors for the least amount of capital investment.

Intuitively, you may think that the more testers you use the better. After all, if you have enough evaluators testing your product, then all the errors should be found. First, as mentioned, this is expensive. Second, it can become a logistics nightmare. Finally, it is unlikely that you can ever detect 100 percent of your application's usability problems.

Fortunately, significant research on usability has been conducted during the last 15 years. Based on the work of Jakob Nielsen, a usability testing expert, you may need fewer testers than you think. Nielsen's research found that the number of usability problems found in testing is:

img

where: E = percent of errors found

n = number of testers

L = percent of usability problems found by a tester

Using the equation with L = 31 percent, a reasonable value Nielsen also gleaned from his research, produces the graph shown in Figure 7.1.

Figure 7.1 Percent Errors Found Versus Number of Users.

img

Examining the graph reveals a few interesting points. First, as we intuitively know, it will never be possible to detect all of the usability errors in the application. It's not theoretically possible, because the curve only converges on 100 percent; it never actually reaches it. Second, you only need a small number of testers. The graph shows that approximately 83 percent of the errors are detected by only 5 testers.

From a project manager's point of view, this is refreshing news. No longer do you need to incur the cost and complexity of working with a large group of testers to check your application. Instead, you can focus on designing, executing, and analyzing your tests—putting your effort and money into what will make the most difference.

Also with fewer testers, you have less analysis to do, so you can quickly implement changes to the application and the testing methodology; then test again with a new group of testers. In this iterative fashion you can ensure that you catch most problems at minimal cost and time.

Nielsen's research was conducted in the early 1990s while he was a systems analyst at Sun Microsystems. On the one hand, his data and approach to usability testing provides concrete guidance to those of us involved in software design. On the other hand, since usability testing has become more important and commonplace, and more evidence has been gathered from practical testing and better formulaic analysis, some researchers have come to question Nielsen's firm statements that three to five users should be enough.

Nielsen himself cautions that the precise number of testers depends on economic considerations (how many testers will your budget support) and on the type of system you are testing. Critical systems such as navigation applications, banking or other financial software, or security related programs will, per force, require closer user scrutiny than less-critical software.

Among the considerations important to developers who are designing a usability testing program are whether the number of users and their individual orientations represent sufficiently the total population of potential users. In addition, as Nielsen notes, some programs are more complex than others, meaning that detecting a significantly large percentage of errors will be more difficult. And, since different users, because of their backgrounds and experiences, are likely to detect different types of errors, an individual testing situation may dictate a larger number of testers.

As with any testing methodology, it is up to the developers and project administrators to design the tests, present a reasonable budget, evaluate interim results, and conduct regressive tests as appropriate to the software system, the overall project, and the client.

Data-Gathering Methods

Test administrators or observers can gather test results in several ways. Videotaping a user test and using a think-aloud protocol can provide excellent data on software usability and user perceptions about the application. A think-aloud protocol involves users speaking aloud their thoughts and observations while they are performing the assigned software testing tasks. Using this process, the test participants describe out loud their task, what they are thinking about the task, and/or whatever else comes to their mind as they move through the testing scenario. Even when using think-aloud protocol testing, developers may want to follow up with participants after the test to get posttest comments, feelings, and observations. Taken together, these two levels of user thoughts and comments can provide valuable feedback to developers for software corrections or improvements.

A disadvantage to the think-aloud process, where videotaping or observers are involved, is the possibility that the user experience will be clouded or modified by the unnatural user environment. Developers also may wish to conduct remote user testing, whereby the application is installed at the testing user's business where the software may ultimately be applied. Remote testing has the advantage of placing the user in a familiar environment, one in which the final application likely would be used, thus removing the potential for external influences modifying test results. Of course, the disadvantage is that developers may not receive feedback as detailed as would be possible with a think-aloud protocol.

Nevertheless, in a remote testing environment, accurate user data still can be gathered. Additional software can be installed with the application to be tested to gather user keystrokes and to capture time required for the user to complete each assigned task. This requires additional development time (and more software), but the results of such tests can be enlightening and very detailed.

In the absence of timing or keystroke capture software, testing users can be tasked with writing down the start and end times of each assigned task, along with brief one-word or short-phrase comments during the process. Posttest questionnaires or interviews can help users recall their thoughts and opinions about the software.

A sophisticated but potentially useful data-gathering protocol is eye tracking. When we read a printed page, view a graphical presentation, or interact with a computer screen, our eyes move over the scanned material in particular patterns. Research data gathered on eye movement over more than 100 years shows that eye movement—particularly how long an observer pauses on certain visual elements—reflects at least to some degree the thought processes of the observer. Tracking this eye movement, which can be done with video systems and other technologies, shows researchers which visual elements attract the observers attention, in what order, and for how long. Such data is potentially useful in determining the efficiency of software screens presented to users.

Despite extensive research during the last half of the twentieth century, however, some controversy remains over the ultimate value of eye movement research in specific applications. Still, coupled with other user testing techniques, where developers need the deepest possible user input data to ensure the highest level of software efficiency (weapons guidance systems, robotic control systems, vehicle controls or other system that require fast and accurate responses), eye tracking can be a useful tool.

Usability Questionnaire

As with the software testing procedure itself, a usability questionnaire should be carefully planned to return the information required from the associated test procedure. Although you may want to include some questions that elicit free-form comments from the user, in general you want to develop questionnaires that generate responses that can be counted and analyzed across the spectrum of testers. These fall into three general types:

  • Yes/no answers
  • True/false answers
  • Agree/disagree on a scale

For example, instead of asking “What is your opinion of the main menu system,” you might ask a series of questions that require an answer from 1 to 5, where 5 is totally agree and 1 is totally disagree:

1. The main menu was easy to navigate.

2. It was easy to find the proper software operation from the main menu.

3. The screen design led me quickly to the correct software operational choices.

4. Once I had operated the system, it was easy to remember how to repeat my actions.

5. The menu operations did not provide enough feedback to verify my choices.

6. The main menu was more difficult to navigate than other similar programs I use.

7. I had difficulty repeating previously accomplished operations.

Notice that it may be good practice to ask the same question more than once, but present it from the opposite perspective so that one elicits a negative response and the other a positive one. Such practice can ensure that the user understood the question and that perceptions remain constant. In addition, you want to separate the user questionnaire into sections that correspond to software areas tested or to the testing tasks assigned.

Experience will teach you quickly which types of questions are conducive to data analysis and which ones aren't very useful. Statistical analysis software is available to help capture and interpret data. With a small number of testing users, the usability test results may be obvious; or you might develop an ad hoc analysis routine within a spreadsheet application to better document results. For large software systems that undergo extensive testing with a large user base, statistical software may help uncover trends that aren't obvious with manual interpretation methods.

When Is Enough, Enough?

How do you plan usability testing so that all aspects of the software are reasonably tested while staying within an acceptable budget? The answer to that question, of course, depends in part on the complexity of the system or unit being tested. If budget and time allow, it is advisable to test software in stages, as each segment is completed. If individual components have been tested throughout the development process, then the final series of tests need only test the integrated operation of the parts.

Additionally, you may design component tests, which are intended to test the usability of an interactive component, something that requires user input and that responds to this input in a user-perceivable way. This kind of feedback testing can help improve the user experience, reduce operational errors, and improve software consistency. Again, if you have tested a software system at this level as the user interface was being designed, you will have collected a significant body of important testing and operational knowledge before total system testing begins.

How many individual users should test your software? Again, system complexity and initial test results should dictate the number of individual testers. For example, if three or five (or some reasonable number) of users have difficulty navigating from the opening screen to screens that support the assigned tasks, and if these users are sufficiently representative of the target market, then you likely have enough information to tell you that the user interface needs more design work.

A reasonable corollary to this might be that if none of the initial testers have a problem navigating through their assigned tasks, and none uncover any mistakes or malfunctions, then perhaps the testing pool is too small. After all, is it reasonable to assume that usability tests of a reasonably complex software system will uncover no errors or required changes? Recall principle 6, from Table 2.1: Examining a program to see if it does not do what it is supposed to do is only half the battle; the other half is seeing whether the program does what it is not supposed to do. There's a subtle difference in this comparison. You might find that a series of users determine that a program does, in fact, seem to do what it is supposed to do. They find no errors or problems in working through the software. But have they also proven that the program isn't doing anything it is not supposed to do? If things appear to be running too smoothly during initial testing, it probably is time for more tests.

We don't believe there is a formula that tells you how many tests each user should conduct, or how many iterations of each test should be required. We do believe, however, that careful analysis and understanding of the results you gather from some reasonable number of testers and tests can guide you to the answer of when enough testing is enough.

Summary

Modern software, coupled with the pressure of intense competition and tight deadlines, make user testing of any software product crucial to successful development. It stands to reason that the targeted software user can be a valuable asset during testing. The knowledgeable user can determine whether the product meets the goal of its design, and by conducting real-world tasks can find errors of commission and omission.

Depending on the software target market, developers also may benefit from selecting random users—persons who are not familiar with the program's specification, or perhaps even the industry or market for which it is intended—who can uncover errors or user interface problems. For the same reason that the developers don't make good error testers, expert users may avoid operational areas that might produce problems because they know how the software is supposed to work. Over many years of software development we have discovered one unavoidable testing truth: The software the developer has tested for many hours can be broken easily, and in a short time, by an unsophisticated user who attempts a task for which the user interface or the software was not designed.

Remember, too, that a key to successful user (or usability) testing is accurate and detailed data gathering and analysis. The data-gathering process actually begins with the development of detailed user instructions and a task list. It ends by compiling results from user observation or posttest questionnaires.

Finally, the testing results must be interpreted, and then developers must effect software changes identified from the data. This may be an iterative process wherein the same testing users are asked to complete similar tasks after identified software changes have been completed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.187.210