Chapter review questions

1. Assume you need to conduct a single-shot (not iterative) formative usability study that can detect about 85% of the problems that have a probability of occurrence of 0.25 for the specific participants and tasks used in the study (in other words, not 85% of ALL possible usability problems, but 85% of the problems discoverable with your specific method). How many participants should you plan to run?
2. Suppose you decide that you will maintain your goal of 85% discovery, but need to set the target value of p to 0.20. Now how many participants do you need?
3. You just ran a formative usability study with 4 participants. What percentage of the problems of p = 0.50 are you likely to have discovered? What about p = 0.01; 0.90; 0.25?
4. Table 7.11 shows the results of a formative usability evaluation of an interactive voice response application (Lewis, 2008) in which six participants completed four tasks, with the discovery of twelve distinct usability problems. For this matrix, what is the observed value of p across these problems and participants?
5. Continuing with the data in Table 7.11, what is the adjusted value of p?
6. Using the adjusted value of p, what is the estimated total number of the problems available for discovery with these tasks and types of participants? What is the estimated number of undiscovered problems? How confident should you be in this estimate? Should you run more participants, or is it reasonable to stop?

Table 7.11

Results from Lewis (2008) Formative Usability Study

Participant 1 2 3 4 5 6 7 8 9 10 11 12 Count Proportion
1 X X 2 0.17
2 X X X X X 5 0.42
3 X X 2 0.17
4 X X X 3 0.25
5 X X X X 4 0.33
6 X X X X 4 0.33
Count 1 2 3 1 1 1 1 1 1 1 5 2 20

Note: X = specified participant experienced specified problem

Answers to chapter review questions

1. From Table 7.1, when p = 0.25, you need to run seven participants to achieve the discovery goal of 85% [P(x ≥ 1) = 0.85]. Alternatively, you could search the row in Table 7.2 for p = 0.25 until you find the sample size at which the value in the cell first exceeds 0.85, which is at n = 7.
2. Tables 7.1 and 7.2 do not have entries for p = 0.20, so you need to use the formula below, which indicates a sample size requirement of 9 (8.5 rounded up). n=ln(1P(x1))ln(1p)=ln(10.85)ln(10.20)=ln(0.15)ln(0.80)=1.8970.223=8.5image
3. Table 7.2 shows that the expected percentage of discovery when n = 4 and p = 0.50 is 94%. For p = 0.01, it’s 4% expected discovery; for p = 0.90, it’s 100%; for p = 0.25, it’s 68%.
4. For the results shown in Table 7.11, the observed average value of p is 0.28. You can get this by averaging the average values across the six participants (shown in the table), the average values across the twelve problems (not shown in the table), or dividing the number of filled cells by the total number of cells [20/(6 × 12) = 20/72 = 0.28].
5. To compute the adjusted value of p, use the formula below. The deflation component is (0.28 – 1/6)(1 – 1/6) = 0.11(0.83) = 0.09. Because there were 12 distinct problems, 8 of which occurred once, the Good-Turing component is 0.28/(1 + 8/12) = 0.28/1.67 = 0.17. The average of these two components—the adjusted value of p—is 0.13. padj=12(pest1n)11n+12pest(1+GTadj)image.
6. The adjusted estimate of p (from Problem 5) is 0.13. We know from Table 7.11 that there were twelve problems discovered with six participants. To estimate the percentage of discovery so far, use 1 – (1 – p)n. Putting in the values of n and p, you get 1 – (1 – 0.13)6 = 0.57 (57% estimated discovery). If 57% discovery equals 12 problems, then the estimated number of problems available for discovery is 12/0.57 = 21.05 (rounds up to 22), so the estimated number of undiscovered problems is about 10. Because a sample size of 6 is in the range of over-optimism when using the binomial model, there are probably more than 10 problems remaining for discovery. Given the results shown in Table 7.9, it’s reasonable to believe that there could be an additional two to seven undiscovered problems, so it’s unlikely that there are more than 17 undiscovered problems. This low rate of problem discovery (padj = .13) is indicative of an interface in which there are few high-frequency problems to find. If there are resources to continue testing, it might be more productive to change the tasks in an attempt to create the conditions for discovering a different set of problems and, possibly, more frequently occurring problems.


