Chapter 14

Evaluation Methods

Usability evaluations can generally be divided into formative and summative evaluations. Formative evaluations are done early in the product development life cycle to discover insights and shape the design direction. They typically involve usability inspection methods or usability testing with low-fidelity mocks or prototypes. Summative evaluations, on the other hand, are typically done toward the end of the product development life cycle with high-fidelity prototypes or the actual final product to evaluate it against a set of metrics (e.g., time on task, success rate). This can be done via in-person or remote usability testing or live experiments. Table 14.1 lists several formative and summative evaluation methods. Each method mentioned in this chapter could fill an entire chapter, if not a book, devoted just to discussing its origins, alternatives, and intricacies. As a result, our goal is to provide an overview of the evaluation methods available and provide information about where to look to learn more.

Table 14.1

Comparison of evaluation methodologies

MethodFormative or summativeState of your productGoalResources required
Heuristic evaluationFormativeLow to high fidelityIdentify violations of known usability guidelinesLow
Cognitive walkthroughFormativeLow to high fidelityIdentify low-hanging issues earlyLow
Usability testing in-personIt dependsAny stageIdentify usability issuesMedium
Eye trackingSummativeVery high fidelity to launchedIdentify where users look for features/informationHigh
RITEFormativeAny stageIterate quickly on a designHigh
Desirability testingSummativeVery high fidelity to launchedMeasure emotional responseMedium
Remote testingSummativeVery high fidelity to launchedIdentify usability issues across large sampleLow to High
Live experimentsSummativeLaunchedMeasure product changes with large sample of actual usersHigh

t0010

Introduction

Just like with other research methods described in this book, none of the evaluation methods here are meant to stand alone. Each uncovers different issues and should be used in combination to develop the ideal user experience.

At a Glance

> Things to be aware of when conducting evaluations

> Evaluation formats to choose from

> Preparing for an evaluation

> Data analysis and interpretation

> Communicating the findings

Things to Be Aware of When Conducting Evaluations

It is ideal to have a third party (e.g., someone who has not been directly involved in the design of the product or service) conduct the evaluations to minimize bias. This is not always possible, but regardless of who conducts the evaluation, the evaluation must remain neutral. This means that he or she must:

 Recruit representative participants, not just those who are fans or critics of your company/product

 Use a range of representative tasks, not just those that your product/service is best or worst at

 Use neutral language and nonverbal cues to avoid giving participants any signal what the “right” response is or what you want to hear, and never guide the participants or provide your feedback on the product

 Be true to the data, rather than interpreting what he or she thinks the participant “really meant”

u14-01-9780128002322
Cartoon by Abi Jones

Evaluation Methods to Choose From

Depending on where you are in product development, what your research questions are, and what your budget is, there are a number of evaluation methodologies to choose from (see Table 14.1 for a comparison). Whether you choose a storyboard, paper prototype, low- or high-fidelity interactive prototype, or a launched product, you should evaluate it early and often.

Usability Inspection Methods

Usability inspection methods leverage experts (e.g., people with experience in usability/user research, subject matter experts) rather than involve actual end users to evaluate your product or service against a set of specific criteria. These are quick and cheap ways of catching the “low-hanging fruit” or obvious usability issues throughout the product development cycle. If you are pressed for time or budget, these methods represent a minimum standard to meet. However, be aware that experts can miss issues that methods that involve users will reveal. System experts can make incorrect assumptions about what the end user knows or wants.

Heuristic Evaluations

Jakob Nielsen and Rolf Molich introduced the heuristic evaluation as a “discount” usability inspection method (Nielsen & Molich, 1990). “Discount usability engineering” meant that the methods were designed to save practitioners time and money over the standard lab usability study (Nielsen, 1989). They argued that there are 10 heuristics that products should adhere to for a good user experience (Nielsen, 1994). Three to five UX experts (or novices trained on the heuristics)—not end users or subject matter experts (SMEs)—individually assess a product by walking through a core set of tasks and noting any places where heuristics are violated. The evaluators then come together to combine their findings into a single report of issues that should be addressed. Note that it is difficult for every element in a product to adhere to all 10 heuristics, as they can sometimes be at odds. Additionally, products that adhere to all 10 heuristics are not guaranteed to meet users’ needs, but it is significantly less likely that they will face the barriers of poor design. Nielsen’s heuristics are as follows:

1. Visibility of system status: Keep the user informed about the status of your system and give them feedback in a reasonable time.

2. Match between system and the real world: Use terminology and concepts the user is familiar with and avoid technical jargon. Present information in a logical order and follow real-world conventions.

3. User control and freedom: Allow users to control what happens in the system and be able to return to previous states (e.g., undo, redo).

4. Consistency and standards: Be consistent throughout your product (e.g., terminology, layout, actions). Follow known standards and conventions.

5. Error prevention: To the greatest extent possible, help users avoid making errors, make it easy for users to see when an error has been made (i.e., error checking), and give users a chance to fix them before committing to an action (e.g., confirmation dialog).

6. Recognition rather than recall: Do not force users to rely on their memory to use your system. Make options or information (e.g., instructions) visible or easily accessible across your product when needed.

7. Flexibility and efficiency of use: Make accelerators available for expert users but hidden for novice ones. Allow users to customize the system based on their frequent actions.

8. Aesthetic and minimalist design: Avoid irrelevant information, and hide infrequently needed information. Keep the design to a minimum to avoid overloading the user’s attention.

9. Help users recognize, diagnose, and recover from errors: Although your system should prevent errors in the first place, when they do happen, provide error messages in clear terms (no error codes) that indicate the problem and how to recover from it.

10. Help and documentation: Ideally, your system should be used without documentation; however, that is not always realistic. When help or documentation is needed, make it brief, easy to find, focused on the task at hand, and clear.

We have provided a worksheet at http://tinyurl.com/understandingyourusers to help you conduct heuristic evaluations.

Cognitive Walkthroughs

Cognitive walkthroughs are a formative usability inspection method (Lewis, Polson, Wharton, & Rieman, 1990; Polson, Lewis, Rieman, & Wharton, 1992; Nielsen, 1994). Whereas the heuristic evaluation looks at a product or system holistically, the cognitive walkthrough is task-specific. It is based on the belief that people learn systems by trying to accomplish tasks with it, rather than first reading through instructions. It is ideal for products that are meant to be walk-up and use (i.e., no training required).

In a group of three to six people, your colleagues or SMEs are asked to put themselves in the shoes of the intended user group and to walk through a scenario. To increase the validity and reliability of the original method, Jacobsen and John (2000) recommended that you include individuals with a variety of backgrounds that span the range of your intended user audience to increase the likelihood of catching issues and create scenarios to cover the full functionality of the system and that your colleagues consider multiple points of views of the intended user group (e.g., users booking a flight for themselves versus for someone else). State a clear goal that the user wants to achieve (e.g., book a flight, check-in for a flight) and make sure everyone understands that goal. The individual conducting the walkthrough presents the scenario and then shows everyone one screen (e.g., mobile app, airline kiosk screen) at a time. When each screen is presented, everyone is asked to write down the answers to four questions:

1. Is this what you expected to see?

2. Are you making progress toward your goal?

3. What would your next action be?

4. What do you expect to see next?

Going around the room, the evaluator asks each individual to state his or her answers and provide any related thoughts. For example, if they feel like they are not making progress toward their goal, they state why that is. A separate notetaker should identify any areas where expectations are violated and other usability issues identified.

Conduct two or three group sessions to ensure you have covered your scenarios and identified the range of issues. When examining the individual issues identified, you should consider if the issue can be applied more generally across the product (Jacobsen & John, 2000). For example, your colleagues may have noted that they want to live chat with a customer service agent when booking their flight. You should consider if there are other times when users may want to live chat with an agent, and therefore, that feature should be made more widely available. Ideally, you would iterate on the designs and conduct another round to ensure you have addressed the issues.

Usability Testing

Usability testing is the systematic observation of end users attempting to complete a task or set of tasks with your product based on representative scenarios. In individual sessions, participants interact with your product (e.g., paper prototype, low- or high-fidelity prototype, the launched product) as they think aloud (refer to Chapter 7, “Using a Think-Aloud Protocol” section, page 169), and user performance is evaluated against metrics such as task success, time on task, and conversion rate (e.g., whether or not the participant made a purchase). Several participants are shown the same product and asked to complete the same tasks in order to identify as many usability issues as possible.

There is a lot of debate about the number of participants needed for usability evaluation (see Borsci et al. (2013) for an academic evaluation and Sauro (2010) for a great history on the sample size debate). Nielsen and Landauer (1993) found that you get a better return on your investment if you conduct multiple rounds of testing; however, only five participants are needed per round. In other words, you will find more usability issues conducting three rounds of testing with five participants each if you iterate between rounds (i.e., make changes or add features/functionality to your prototype or product based on each round of feedback) than if you conduct a single study with 15 participants. If you have multiple, distinct user types, you will want to include three to four participants from each user type in your study per round.

There are a few variations on the in-person usability test that you can choose from based on your research questions, space availability, user availability, and budget.

Lab Study

This involves bringing users to a dedicated testing space within your company or university or at a vendor’s site. If your organization does not have to have a formal lab space for you to conduct a usability study, you can create your own impromptu lab with a conference room, laptop, screen recording software, and (optionally) video camera. See Chapter 4, “Setting Up Research Facilities” on page 82 to learn more. An academic or corporate lab environment likely does not match the user’s environment. It probably has high-end equipment and a fast Internet connection, looks like an office, and is devoid of any distractions (e.g., colleagues, spouses, or kids making noise and interrupting you). Although a lab environment may lack ecological validity (i.e., mimic the real-world environment), it offers everyone a consistent experience and allows participants to focus on evaluating your product.

Suggested Resources for Further Reading

The following books offer detailed instruction for preparing, conducting, and analyzing usability tests:

 Barnum, C. M. (2010). Usability testing essentials: ready, set … test!. Elsevier.

 Dumas, J. S., & Loring, B. A. (2008). Moderating usability tests: Principles and practices for interacting. Morgan Kaufmann.

Eye Tracking

One variation on a lab study incorporates a special piece of equipment called an eye tracker. While most eye trackers are used on a desktop, there are also mobile eye trackers that can also be used in the field (e.g., in a store for shopping studies, in a car for automotive studies). Eye tracking was first used in cognitive psychology (Rayner, 1998); however, the HCI community has adapted it to study where people look (or do not look) for information or functionality and for how long. Figures 14.1 and 14.2 show desktop and mobile eye tracking devices. By recording participant fixations and saccades (i.e., rapid eye movements between fixation points), a heat map can be created (see Figure 14.3). The longer participants’ gazes stay fixed on a spot, the “hotter” the area is on the map, indicated by the color red. As fewer participants look at an area or for less time, the “cooler” it gets and transitions to blue. Areas where no one looked are black. By understanding where people look for information or features, you can understand whether or not participants discover and process an item. If participants’ eyes do not dwell on an area of an interface, there is no way for them to process that area. This information can help you decide if changes are needed to your design to make something more discoverable.

f14-01-9780128002322
Figure 14.1 Tobii x1 Light Eye Tracker.
f14-02-9780128002322
Figure 14.2 Tobii eye tracking glasses in a shopping study.
f14-03-9780128002322
Figure 14.3 Eye tracking heat map.

Eye tracking studies are the one type of evaluation methodology where you do not want participants to think aloud as they interact with your product. Asking participants to think aloud will cause them to change their eye gaze as they speak with the evaluator or recall past remarks (Kim, Dong, Kim, & Lee, 2007). This will muddy the eye tracking data and should be avoided. One work-around is called the retrospective think-aloud in which participants are shown a video of their session and asked to tell the moderator what they were thinking at the time (Russell & Chi, 2014). A study found that showing participants a gaze plot or gaze video cue during the retrospective think-aloud resulted in a higher identification of usability issues (Tobii Technology, 2009). This will double the length of your study session so most researchers using this method will actually include half the number of tasks they normally would in order to keep their entire session at an hour.

Suggested Resources for Further Reading

Bojko, A. (2013). Eye tracking the user experience: A practical guide to research. New York: Rosenfeld Media.

Rapid Iterative Testing and Evaluation

In 2002, the Microsoft games division developed Rapid Iterative Testing and Evaluation (RITE) as a formative method to quickly address issues that prevented participants from proceeding in a game and evaluating the remaining functionality (Medlock, Wixon, Terrano, Romero, & Fulton, 2002). Unlike traditional usability testing that is meant to identify as many usability issues as possible and measure the severity of the issues in a product, RITE is designed to quickly identify any large usability issue that is preventing users from completing a task or does not allow the product to meet its stated goals. RITE studies should be conducted early in the development cycle with a prototype. The development team must observe all usability sessions, and following each session where a blocking usability issue is identified, they agree on a solution. The prototype is then updated and another session is conducted to see if the solution fixed the problem. If the team cannot agree on the severity of a problem, an additional session can be conducted before any changes are made. This cycle of immediately fixing and testing updated prototypes continues until multiple sessions are conducted where no further issues are identified. In contrast to traditional usability testing where five or more participants see the same design, in RITE, at most, two participants would see the same design before changes are made for the next session.

RITE sessions typically require more sessions and therefore more participants than a single, traditional usability study with five to eight participants. Additionally, because this method requires the dedication of the development team to observe all sessions, brainstorm solutions following each session, and someone to update the prototype quickly and repeatedly, it is a more resource-intensive methodology. Overall, this can be perceived as a risky method because a lot of resources are invested early, based on initially small sample sizes. However, if done early in the development cycle, the team can feel confident they are building a product free of major usability issues.

Café Study

At Google, when we need to make a quick decision about which design direction to take among several, we may conduct a 5-10-minute study with guests that visit our cafés for lunch. In only a couple of hours, we can get feedback from 10 or more participants using this formative method. Although the participants may be friends or family members of Googlers, there is surprising diversity in skills, demographics, and familiarity with Google products. We are able to collect enough data in a very short period of time to inform product direction and identify critical usability issues, confusing terminology, etc. You could do this at any café, in front of a grocery store, at any mall, etc. Of course, you need permission of the store manager or owner.

Tip

Buying and providing gift cards as incentives from the manager or owner of the establishment where you wish to conduct your study makes your study mutually beneficial to everyone.

In the Field

You can increase the ecological validity of your study by conducting evaluations in the field. This will give you a better sense of how people will use your product in the “real world.” If your product will be used at home, you could conduct the study in the participants’ homes, for example. See Chapter 13, “Field Studies” on page 380 for tips on conducting field research. This can be done either very early or much later in the product development life cycle, depending on your goals (i.e., identify opportunities and inform product direction or measure the product against a set of metrics).

Desirability Testing

To be successful, it is not enough for a product to be usable (i.e., users can complete a specified set of tasks with the product); it must also be pleasant to use and desirable. Don Norman (2004) argued that aesthetically-pleasing products are actually more effective. The games division at Microsoft introduced us to another new methodology in 2002, this time focusing on emotions, rather than usability issues (Benedek & Miner, 2002). Desirability testing evaluates whether or not a product elicits the desired emotional response from users. It is most often conducted with a released version of your product (or competitor’s product) to see how it makes participants feel. The Microsoft researchers identified a set of 118 positive, negative, and neutral adjectives based on market research and their own research (e.g., unconventional, appealing, inconsistent, professional, motivating, intimidating). However, you may have specific adjectives in mind that you would like users to feel (or not feel) when using your product. You can add those to the list; however, be mindful to keep a balance of positive and negative adjectives.

To conduct the method, create a set of flash cards with a single adjective per index card. After interacting with the product, perhaps following your standard usability study, hand the stack of cards to participants. Ask them to pick anywhere from five to ten cards from the stack that describe how the product made them feel. Then, ask participants to tell you why they chose each card. The researchers suggest conducting this method with 25 participants per user segment. From here, you can do affinity diagramming on the themes participants highlighted. If your product does not elicit the emotions you had hoped for, you can make changes as needed (e.g., adding/removing functionality, changing the tone of messaging, adding different visuals) and retest.

Remote Testing

It is not always possible, or even desirable, to conduct evaluations with participants in person. For example, if you are based in any of the tech-savvy regions of the country, conducting only in-person sessions around your company will result in a sampling bias. You may also miss out on any region-specific issues users face (e.g., customer service hours are unfriendly to one or more time zones). Remote testing can help you gather data from participants outside of your geographic area. Another benefit of remote testing is that you can typically collect feedback from a much larger sample size in a shorter period of time, and no lab facilities are needed. Unfortunately, if you are conducting studies on hardware devices or with highly confidential products, you may still need to conduct studies in person.

There are two ways to conduct remote studies:

1. Use online vendors or services to conduct evaluations with their panels (e.g., UserZoom, UserTesting.com).

2. Use a service like GoToMeeting, WebEx, or Hangouts to remotely connect to the participant in his or her environment (e.g., home, office) while you remain in your lab or office. You will need to call or e-mail directions to the participant for how to connect your computer to his or her computer to share screens, and it may be necessary to walk participants through the installation step-by-step over the phone. Once you have done this, you can show the participant your prototype and conduct the study as you normally would in the lab. Alternatively, you can ask the participant to show you his or her computer or, using the web cam, show his or her environment, mobile device, etc. You will also need to leave time at the end of the session to walk participants through the process of uninstalling the application.

Tip

It is important to realize that not all participants will be comfortable installing something on their computers, so you will need to notify participants during the recruitment phase about what you will be asking them to install so they can ask questions and be completely informed when deciding whether or not to participate. Do not put the participant in the awkward position of having to say, “I’m not comfortable with that” at the very beginning of the session. You should also be prepared to help the participant with technical troubleshooting if the application causes problems for his or her computer. Do not leave the participant stranded with computer problems as a result of agreeing to participate in your study! This may mean getting on the phone with your IT department or customer support for the company whose application you are using to ensure the participant’s issues are resolved.

Live Experiments

Live experiments, from an HCI standpoint, is a summative evaluation method that involves comparing two or more designs (live websites) to see which one performs better (e.g., higher click-through rate, higher conversion rate). To avoid biasing the results, users in industry studies are usually not informed they are part of an experiment; however, in academia, consent is often required. In A/B testing, a percent of users are shown one design (“A”) and, via logs analysis, performance is compared against another version (“B”). Designs can be a variation on a live control (typically the current version of your product) or two entirely new designs. See Figure 14.4 for an illustration. Multivariate testing follows the same principle, but in this case, multiple variables are manipulated to examine how changes in those variables interact to result in the ideal combination. All versions must be tested in parallel to control for extraneous variables that could affect your experiment (e.g., website outage, change in fees for your service).

f14-04-9780128002322
Figure 14.4 A/B testing.

You will need a large enough sample per design in order to conduct statistical analysis and find any significant differences; however, multivariate testing requires a far larger sample than simple A/B because of the number of combinations under consideration.

There are a few free tools and several fee-based online services that can enable you to conduct live experiments. Simply do a web search for “website optimization” or “A/B testing,” and you will find several vendors and tools to help you.

Suggested Resources for Further Reading

Designing successful experiments is not an easy task. We recommend the two books below. The first offers a great introduction to this topic, and the second provides information about statistics for user research that will be extremely helpful when analyzing your data.

Siroker, D., & Koomen, P. (2013). A/B testing: The most powerful way to turn clicks into customers. John Wiley & Sons.

Sauro, J., & Lewis, J. R. (2012). Quantifying the user experience: Practical statistics for user research. Elsevier.

Data Analysis and Interpretation

In addition to identifying usability issues, there are a few metrics you may consider collecting in a summative evaluation:

 Time on task: Length of time to complete a task.

 Number of errors: Errors made completing a task and/or across the study.

 Completion rate: Number of participants that completed the task successfully.

 Satisfaction: Overall, how satisfied participants are on a given task and/or with the product as a whole at the end of the study (e.g., “Overall, how satisfied or dissatisfied are you with your experience? Extremely dissatisfied, Very dissatisfied, Moderately dissatisfied, Slightly dissatisfied, Neither satisfied nor dissatisfied, Slightly satisfied, Moderately satisfied, Very satisfied, Extremely satisfied”).

 Page views or clicks: As a measure of efficiency, you can compare the number of page views or clicks by a participant against the most efficient/ideal path. Of course, the optimal path for a user may not be his or her preferred path. Collecting site analytics can tell you what users do but not why. To understand why users traverse your product in a certain path, you must conduct other types of studies (e.g., lab study, field study).

 Conversion: Usually measured in a live experiment, this is a measure of whether or not participants (users) “converted” or successfully completed their desired task (i.e., signed up, made a purchase).

In a benchmarking study, you can compare the performance of your product or service against that of a competitor or a set of industry best practices. Keep in mind that with small sample sizes (e.g., in-person studies), you should not conduct statistical tests on the data and expect it will be representative of your broader population. However, it can be helpful to compare the metrics between rounds of testing to see if design solutions are improving the user experience.

Suggested Resources for Further Reading

Albert, W., and Tullis, T. (2013). Measuring the user experience: collecting, analyzing, and presenting usability metrics. Morgan Kaufman.

Communicating the Findings

For all of the evaluation methods listed, you should document the design that was evaluated to avoid repeating the same mistakes in later versions. This might be as simple as including screenshots in a slide deck or maintaining version control on a prototype so it is easy to see exactly what the participant experienced.

In small sample size or informal studies (e.g., cognitive walkthrough, café study, RITE), a simple list of issues identified and recommendations is usually sufficient to communicate with your stakeholders. A brief description of the participants can be important, particularly if there are any caveats stakeholders should be aware of (e.g., only company employees participated for confidentiality reasons). These are meant to be fast, lightweight methods that do not slow down the process with lengthy documentation.

For larger or more complex studies (e.g., eye tracking, live experiment), you will want to include a description of the methodology, participant demographics, graphs (e.g., heat map, click path), and any statistical analysis. Different presentation formats are usually required for different stakeholders. For example, it is unlikely engineers or designers will care about the details of the methodology or statistical analysis, but other researchers will. A simple presentation of screenshots showing the issues identified and recommendations is often best for the majority of your stakeholders, but we recommend creating a second report that has all of the details mentioned above, so if questions arise, you can easily answer (or defend) your results.

Pulling It All Together

In this chapter, we have discussed several methods for evaluating your product or service. There is a method available for every stage in your product life cycle and for every schedule or budget. Evaluating your product is not the end of the life cycle. You will want (and need) to continue other forms of user research so you continually understand the needs of your users and how to best meet them.

Case Study: Applying Cognitive Walkthroughs in Medical Device User Interface Design

Arathi Sethumadhavan    Manager, Connectivity Systems Engineering, Medtronic, Inc.

Medtronic, Inc., is the world’s largest medical technology company. As a human factors scientist at Medtronic, my goal is to proactively understand the role of the user and the use environment, design products that minimize use error that could lead to user or patient harm, and maximize clinical efficiency and product competitiveness by promoting ease of learning and ease of use. I have been conducting human factors research in the Cardiac Rhythm and Disease Management (CRDM) division of Medtronic, which is the largest and oldest business unit of Medtronic. In this case study, I describe how I used one human factors user research technique, a lightweight cognitive walkthrough, on a heart failure project at CRDM.

Heart failure is a condition in which the heart does not pump enough blood to meet the body’s needs. According to the Heart Failure Society of America, this condition affects five million Americans with 400,000-700,000 new cases of heart failure diagnosed each year. Cardiac resynchronization therapy (CRT) is a treatment for symptoms associated with heart failure. CRT restores the coordinated pumping of the heart chambers by overcoming the delay in electrical conduction. This is accomplished by a CRT pacemaker, which includes a lead in the right atrium, a lead in the right ventricle, and a lead in the left ventricle. These leads are connected to a pulse generator that is placed in the patient’s upper chest. The pacemaker and the leads maintain coordinated pumping between the upper and the lower chambers of the heart, as well as the left and right chambers of the heart. The location of the leads and the timing of pacing are important factors for successful resynchronization. For patients with congestive heart failure who are at high risk of death due to their ventricles beating fast, a CRT pacemaker that includes a defibrillator is used for treatment.

The Attain Performa® quadripolar lead is Medtronic’s new left ventricle (LV) lead offering, which provides physicians more options to optimize CRT delivery. This lead provides 16 left pacing configurations that allow for electronic repositioning of the lead without surgery if a problem (e.g., phrenic nerve stimulation, high threshold) arises during implant or follow-up. Though the lead offers several programming options during implant and over the course of therapy long-term, the addition of 16 pacing configurations to programming has the potential to increase clinician workload. To reduce clinician workload and expedite clinical efficiency, Medtronic created VectorExpressTM, a smart solution that replaces the 15-30-minute effort involved in manually testing all the 16 pacing configurations through a one-button click. VectorExpressTM completes the testing in two to three minutes and provides electrical data that clinicians can use to determine the optimal pacing configuration. This feature is a big differentiator from the competitive offering.

Uniqueness of the Medical Domain

An important aspect that makes conducting human factors work in the medical device industry different from non-healthcare industries is the huge emphasis regulatory bodies place on minimizing user errors and use-related hazards caused by inadequate medical device usability. International standards on human factors engineering specify processes that medical device manufacturers should follow to demonstrate that a rigorous usability engineering process has been adopted and risks to user or patient safety have been mitigated. This means analytic techniques (e.g., task analysis, interviews, focus groups, heuristic analysis) as well as formative evaluations (e.g., cognitive walkthrough, usability testing) and validation testing with a production-equivalent system with at least 15 participants from each representative user group is required to optimize medical device design. Compliance to standards also requires maintenance of records showing that the usability engineering work has been conducted. Though a variety of user feedback techniques were employed in this project as well, this case study will focus on the use of a lightweight cognitive walkthrough with subject matter experts, which was employed to gather early feedback from users on design ideas before creating fully functional prototypes for rigorous usability testing. Cognitive walkthroughs are a great technique to discover users’ reactions to concepts that are being proposed earlier on in the product development life cycle, to determine whether we are going in the right direction.

Preparing for the Cognitive Walkthroughs

The cognitive walkthrough materials included the following:

 An introduction of the Attain Performa quadripolar lead and the objective of the interview session.

 Snapshots of user interface designs being considered, in a Microsoft PowerPoint format. Having a pictorial representation of the concepts makes it easier to communicate our thoughts with end users and, in turn, gauge users’ reactions.

 Clinical scenarios that would help to evaluate the usefulness of the proposed feature. Specifically, participants were presented with two scenarios: implant, where a patient is being implanted with a CRT device, and follow-up, where a patient has come to the clinic complaining of phrenic nerve stimulation (i.e., hiccups). Data collection form is as follows: For each scenario, a table was created with “questions to be asked during the scenario” (e.g., “When will you use the test?” “How long would you wait for the automated test to run during an implant?” “Under what circumstances would you want to specify a subset of vectors on which you want to run the test?” “How would you use the information in the table to program a vector?”) and “user comments” as headers. Each question had its own tabular entry in the table.

Conducting the Cognitive Walkthroughs

Cognitive walkthroughs were conducted at Medtronic’s Mounds View, Minnesota, campus with physicians. A total of three cognitive walkthroughs (CWs) were conducted. Unlike studies that are conducted in a clinic or hospital where physicians take time out of their busy day to talk to us and where there is a higher potential of interruptions, studies conducted at Medtronic follow a schedule, with physicians dedicating their time to these sessions. Though three CWs at first glance seem like a small sample size, it is important to point out that we followed up with multiple rounds of usability testing with high-fidelity, interactive prototypes later on.

Each CW session included a human factors scientist and a research scientist. The purpose of including the research scientist was to have a domain expert who was able to describe the specifics of the VectorExpressTM algorithm. It is also good practice to include your project team in the research because this helps them understand user needs and motivations firsthand. Both the interviewers had printed copies of the introductory materials and the data collection forms. The PowerPoint slides illustrating the user interface designs were projected onto a big screen in the conference room.

The session began with the human factors scientist giving physicians an overview of the feature that Medtronic was considering and also describing the objective of the session. This was followed by the research scientist giving an overview of how the VectorExpressTM algorithm works—in other words, a description of how the algorithm is able to take the electrical measurements of all the LV pacing configurations. Then, using the context of an implant and follow-up scenario, the human factors scientist presented the design concepts and asked participants questions about how they envisioned the feature to be used. This was a “lightweight” CW, meaning that we did not ask participants each of the four standard questions as recommended by Polson et al. (1992). Time with the participants was extremely limited, and therefore, in order to get as much feedback about the design concepts as possible, we focused on interviewing participants deeply about each screen they saw. Both the interviewers recorded notes in their data collection forms.

Analyzing Information from Cognitive Walkthroughs

The human factors scientist typed up the notes from the CWs by typing in the responses to the questions in the data collection form. The “key takeaways” section was then generated for each CW session that was conducted. The document was then sent to the research scientist for review and edits. The report from each CW session was submitted to the cross-functional team (i.e., Systems Engineering, Software Engineering, and Marketing). Note that these one-on-one CW sessions were also preceded by a focus group with electrophysiologists to gather additional data from a larger group. After all of these sessions, we came together as a cross-functional team and identified key takeaways and implications for user interface design based on the learnings from the CWs and focus groups.

Next Steps

 The feedback obtained from the CWs helped us to conclude that overall, we were going in the right direction, and we were able to learn how users would use the proposed feature during CRT device implants and follow-ups. The CWs also provided insights on the design of specific user interface elements.

 In preparation for formative testing, we developed high-fidelity software prototypes and generated a test plan with a priori definition of usability goals and success criteria for each representative scenario. We worked with Medtronic field employees to recruit representative users for formative testing.

 We also conducted a user error analysis on the proposed user interface, to evaluate potential user errors and any associated hazards.

entity A formative test plan was generated with a priori definition of usability goals and success criteria for each representative scenario.

entity We worked with Medtronic field employees to recruit representative users for formative testing.

Things to Remember

 When conducting any research study, including CWs, flexibility is important. Research sessions with key opinion leaders rarely follow a set agenda. Sessions with highly-skilled users, such as electrophysiologists, involve a lot of discussion, with the physicians asking a lot of in-depth technical questions. Be prepared for that by projecting ahead of time the technical questions they might ask. I have in the past created a cheat sheet with a list of potential questions and answers to these. These cheat sheets should be developed with input from a technical expert.

 Involve cross-functional partners such as the project systems engineer or the research scientist (who are domain experts) in the user research process. They have a much more in-depth understanding of the system that becomes complementary to the role of a human factors engineer.

 Most research studies run into the issue of taking what users say at face value. It is important to question in depth the motivation behind a perceived user need before jumping to conclusions. In addition, it is important to triangulate the data with conclusions derived from other techniques, such as behavioral observations and formative testing.

References

Nielsen J. Estimating the number of subjects needed for a thinking aloud test. International Journal of Human-Computer Studies. 1994;41:385–397.

Borsci S, Macredie RD, Barnett J, Martin J, Kuljis J, Young T. Reviewing and extending the five-user assumption: A grounded procedure for interaction evaluation. ACM Transactions on Computer-Human Interaction. 2013;20(5):29. doi:10.1145/2506210. Retrieved from http://doi.acm.org/10.1145/2506210.

Jacobsen NE, John BE. Two case studies in using cognitive walkthrough for interface evaluation. In: Pittsburgh, PA: Carnegie Mellon University, School of Computer Science; 2000 No. CMU-CS-00-132.

Nielsen J. Usability engineering at a discount. Proceedings of the third international conference on human-computer interaction on designing and using human-computer interfaces and knowledge based systems. 2nd Elsevier Science Inc; 1989 [Blog post] Retrieved from.

Nielsen J, Landauer TK. A mathematical model of the finding of usability problems. In: Proceedings of the INTERACT’93 and CHI’93 conference on human factors in computing systems, ACM; 1993:206–213.

Norman DA. Emotional design: Why we love (or hate) everyday things. New York: Basic Books; 2004.

Polson PG, Lewis C, Rieman J, Wharton C. Cognitive walkthroughs: A method for theory-based evaluation of user interfaces. International Journal of Man-Machine Studies. 1992;36(5):741–773.

Rayner K. Eye movements in reading and information processing: 20 years of research. Psychological Bulletin. 1998;124(3):372.

Sauro J. A brief history of the magic number 5 in usability testing. 2010. Retrieved from https://www.measuringusability.com/blog/five-history.php.

Tobii Technology. Retrospective think aloud and eye tracking: Comparing the value of different cues when using the retrospective think aloud method in web usability testing. 2009. Retrieved from http://www.tobii.com/Global/Analysis/Training/WhitePapers/Tobii_RTA_and_EyeTracking_WhitePaper.pdf.

Nielsen J, Molich R. Heuristic evaluation of user interfaces. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM; 1990:249–256.

Lewis C, Polson P, Wharton C, Rieman J. Testing a walkthrough methodology for theory-based design of walk-up-and-use interfaces. In: CHI ’90 Proceedings; ACM; 1990:235–242.

Kim B, Dong Y, Kim S, Lee KP. Development of integrated analysis system and tool of perception, recognition, and behavior for web usability test: With emphasis on eye-tracking, mouse-tracking, and retrospective think aloud. In: Usability and internationalization. HCI and culture. Berlin, Heidelberg: Springer; 2007:113–121.

Russell DM, Chi EH. Looking back: Retrospective study methods for HCI. Ways of knowing in HCI. New York: Springer; 2014 pp. 373–393.

Benedek J, Miner T. Measuring desirability: New methods for evaluating desirability in a usability lab setting. In: Proceedings of UPA 2002 Conference, Orlando, FL; 2002.

Medlock MC, Wixon D, Terrano M, Romero R, Fulton B. Using the RITE method to improve products: A definition and a case study. Usability Professionals Association; 2002.


entity “To view the full reference list for the book, click here

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.218.187