References
Agresti A. Simple capture-recapture models permitting unequal catchability and variable sampling effort. Biometrics. 1994;50:494–500.
Al-Awar J, Chapanis A, Ford R. Tutorials for the first-time computer user. IEEE Trans. Prof. Commun. 1981;24:30–37.
Baecker RM. Themes in the early history of HCI—some unanswered questions. Interactions. 2008;15(2):22–27.
Barnum, C., Bevan, N., Cockton, G., Nielsen, J., Spool, J., Wixon, D., 2003. The “Magic Number 5”: Is it enough for web testing? In: Proceedings of CHI 2003. Ft. Lauderdale, FL: ACM, pp. 698–699.
Boehm BW. Software Engineering Economics. Englewood Cliffs, NJ: Prentice-Hall; 1981.
Borsci S, Londei A, Federici S. The bootstrap discovery behaviour (BDB): a new outlook on usability evaluation. Cogn. Process. 2011;12:23–31.
Borsci S, MacRedie RD, Barnett J, Martin J, Kuljis J, Young T. Reviewing and extending the five-user assumption: A grounded procedure for interaction evaluation. ACM Trans. Comput. Hum. Interact. 2013;20(5):1–23: .
Bradley JV. Probability; Decision; Statistics. Englewood Cliffs, NJ: Prentice-Hall; 1976.
Bradley JV. Robustness? Br. J. Math. Stat. Psychol. 1978;31:144–152.
Briand LC, El Emam K, Freimut BG, Laitenberger O. A comprehensive evaluation of capture-recapture models for estimating software defect content. IEEE Trans. Softw. Eng. 2000;26(6):518–540.
Burnham KP, Overton WS. Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika. 1979;65:625–633.
Caulton DA. Relaxing the homogeneity assumption in usability testing. Behav. Inf. Technol. 2001;20:1–7.
Chapanis A. Some generalizations about generalization. Hum. Factors. 1988;30:253–267.
Coull BA, Agresti A. The use of mixed logit models to reflect heterogeneity in capture-recapture studies. Biometrics. 1999;55:294–301.
Cowles M. Statistics in Psychology: An Historical Perspective. Hillsdale, NJ: Lawrence Erlbaum; 1989.
Dalal SR, Mallows CL. Some graphical aids for deciding when to stop testing software. IEEE J. Sel. Area. Comm. 1990;8(2):169–175.
Dorazio RM. On selecting a prior for the precision parameter of Dirichlet process mixture models. J. Stat. Plan. Inference. 2009;139:3384–3390.
Dorazio RM, Royle JA. Mixture models for estimating the size of a closed population when capture rates vary among individuals. Biometrics. 2003;59:351–364.
Dumas J. The great leap forward: The birth of the usability profession (1988-1993). J. Usability Stud. 2007;2(2):54–60.
Dumas, J., Sorce, J., Virzi, R., 1995. Expert reviews: How many experts is enough? In: Proceedings of the Human Factors and Ergonomics Society Thirty-Ninth Annual Meeting. Santa Monica, CA: Human Factors and Ergonomics Society, pp. 228–232.
Eick, S.G., Loader, C.R., Vander Wiel, S.A., Votta, L.G., 1993. How many errors remain in a software design document after inspection? In: Proceedings of the Twenty-Fifth Symposium on the Interface. Fairfax Station, V.A.: Interface Foundation of North America, pp. 195–202.
Ennis DM, Bi J. The beta-binomial model: accounting for inter-trial variation in replicated difference and preference tests. J. Sens. Stud. 1998;13:389–412.
Faulkner L. Beyond the five-user assumption: benefits of increased sample sizes in usability testing. Behav. Res. Methods Instrum. Comput. 2003;35:379–383.
Gould JD. How to design usable systems. In: Helander M, ed. Handbook of Human–Computer Interaction. Amsterdam, Netherlands: North-Holland; 1988:757–789.
Gould JD, Boies SJ. Human factors challenges in creating a principal support office system: The Speech Filing System approach. ACM Trans. Inf. Syst. 1983;1:273–298.
Gould JD, Lewis C. Designing for Usability: Key Principles and What Designers Think. Yorktown Heights, NY: IBM Corporation; 1984: .
Gould JD, Boies SJ, Levy S, Richards JT, Schoonard J. The 1984 Olympic message system: a test of behavioral principles of system design. Commun. ACM. 1987;30:758–769.
Guest G, Bunce A, Johnson L. How many interviews are enough? An experiment with data saturation and variability. Field Methods. 2006;18(1.):59–82.
Hertzum M, Jacobsen NJ. The evaluator effect: A chilling fact about usability evaluation methods. Int. J. Hum. Comput. Interact. 2001;13:421–443.
Hornbæk K. Dogmas in the assessment of usability evaluation methods. Behav. Inf. Technol. 2010;29(1):97–111.
Hwang W, Salvendy G. What makes evaluators to find more usability problems?: A meta-analysis for individual detection rates. In: Jacko J, ed. Human-Computer Interaction, Part I, HCII 2007. Heidelberg, Germany: Springer-Verlag; 2007:499–507.
Hwang W, Salvendy G. Integration of usability evaluation studies via a novel meta-analytic approach: What are significant attributes for effective evaluation. Int. J. Hum. Comput. Interact. 2009;25(4):282–306.
Hwang W, Salvendy G. Number of people required for usability evaluation: The 10 ± 2 rule. Commun. ACM. 2010;53(5):130–133.
Jelinek F. Statistical Methods for Speech Recognition. Cambridge, MA: MIT Press; 1997.
Kanis H. Estimating the number of usability problems. Appl. Ergon. 2011;42:337–347.
Kennedy, P. J., 1982. Development and testing of the operator training package for a small computer system. In: Proceedings of the Human Factors Society Twenty-Sixth Annual Meeting. Santa Monica, CA: Human Factors Society, pp. 715–717.
Law, E. L., Hvannberg, E. T., 2004. Analysis of combinatorial user effect in international usability tests. In: Proceedings of CHI 2004. Vienna, Austria: ACM, pp. 9–16.
Lewis, J. R., 1982. Testing small system customer set-up. In: Proceedings of the Human Factors Society Twenty-Sixth Annual Meeting. Santa Monica, CA: Human Factors Society.
Lewis JR. Sample sizes for usability studies: additional considerations. Hum. Factors. 1994;36:368–378.
Lewis, J. R., 2000. Evaluation of problem discovery rate adjustment procedures for sample sizes from two to ten (Tech. Report 29.3362). Raleigh, NC: IBM Corp. Available from: http://drjim.0catch.com/pcarlo5-ral.pdf.
Lewis JR. Evaluation of procedures for adjusting problem-discovery rates estimated from small samples. Int. J. Hum. Comput. Interact. 2001;13:445–479.
Lewis, J. R., 2006a. Effect of level of problem description on problem discovery rates: Two case studies. In: Proceedings of the Human Factors, Ergonomics Fiftieth Annual Meeting. Santa Monica, C.A.: HFES, pp. 2567–2571.
Lewis JR. Sample sizes for usability tests: mostly math, not magic. Interactions. 2006;13(6):29–33.
Lewis JR. Usability evaluation of a speech recognition IVR. In: Tullis T, Albert B, eds. Measuring the User Experience, Chapter 10: Case Studies. Amsterdam, Netherlands: Morgan-Kaufman; 2008:244–252.
Lewis JR. Usability testing. In: Salvendy G, ed. Handbook of Human Factors and Ergonomics. 4th ed. New York, NY: John Wiley; 2012:1267–1312.
Lewis JR, Sauro J. When 100% really isn’t 100%: Improving the accuracy of small-sample estimates of completion rates. J. Usability Test. 2006;3(1):136–150.
Lewis, J. R., Henry, S. C., Mack, R. L., 1990. Integrated office software benchmarks: a case study. In: Proceedings of the Third IFIP Conference on Human-Computer Interaction–INTERACT ’90. Cambridge, UK: Elsevier Science Publishers, pp. 337–343.
Lindgaard, G., Chattratichart, J., 2007. Usability testing: What have we overlooked?” In: Proceedings of CHI 2007. San Jose, CA: ACM, pp. 1415–1424.
Manning CD, Schütze H. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press; 1999.
Medlock MC, Wixon D, McGee M, Welsh D. The rapid iterative test and evaluation method: Better products in less time. In: Bias RG, Mayhew DJ, eds. Cost-Justifying Usability: An Update for the Internet Age. Amsterdam, Netherlands: Elsevier; 2005:489–517.
Nielsen, J., 1992. Finding usability problems through heuristic evaluation. In: Proceedings of CHI ‘92. Monterey, CA: ACM, pp. 373–380.
Nielsen, J., 2000. Why you only need to test with 5 users. Alertbox, www.useit.com/alertbox/20000319.html. (Downloaded January 26, 2011.).
Nielsen, J., Landauer, T. K., 1993. A mathematical model of the finding of usability problems. In: Proceedings of INTERCHI’93. Amsterdam, Netherlands: ACM, pp. 206–213.
Nielsen, J., Molich, R., 1990. Heuristic evaluation of user interfaces. In: Proceedings of CHI ’90. New York, NY: ACM, pp. 249–256.
Perfetti, C., Landesman, L., 2001. Eight is not enough. Available from: http://www.uie.com/articles/eight_is_not_enough/.
Sauro J. The relationship between problem frequency and problem severity in usability evaluations. J. Usability Stud. 2014;10(1):17–25.
Schmettow, M., 2008. Heterogeneity in the usability evaluation process. In: Proceedings of the Twenty-Second British HCI Group Annual Conference on HCI 2008: People and Computers XXII: Culture, Creativity, Interaction—Volume 1. Liverpool, UK: ACM, pp. 89–98.
Schmettow, M., 2009. Controlling the usability evaluation process under varying defect visibility. In: Proceedings of the 2009 British Computer Society Conference on Human-Computer Interaction. Cambridge, UK: ACM, pp. 188–197.
Schmettow M. Sample size in usability tests. Commun. ACM. 2012;55(4):64–70.
Schnabel ZE. The estimation of the total fish population of a lake. Amer. Math. Monthly. 1938;45:348–352.
Smith DC, Irby C, Kimball R, Verplank B, Harlem E. Designing the star user interface. Byte. 1982;7(4):242–282.
Spool, J., Schroeder, W., 2001. Testing websites: five users is nowhere near enough. In: CHI 2001 Extended Abstracts. New York, N.Y., AC.M., pp. 285–286.
Turner CW, Lewis JR, Nielsen J. Determining usability test sample size. In: Karwowski W, ed. The International Encyclopedia of Ergonomics and Human Factors. Boca Raton, FL: CRC Press; 2006:3084–3088.
Virzi, R. A., 1990. Streamlining the design process: Running fewer subjects. In: Proceedings of the Human Factors Society Thirty-Fourth Annual Meeting. Santa Monica, CA: Human Factors Society, pp. 291–294.
Virzi RA. Refining the test phase of usability evaluation: how many subjects is enough? Hum. Factors. 1992;34:457–468.
Virzi RA. Usability inspection methods. In: Helander MG, Landauer TK, Prabhu PV, eds. Handbook of Human–Computer Interaction. second ed. Amsterdam, Netherlands: Elsevier; 1997:705–715.
Walia, G. S., Carver, J. C., 2008. Evaluation of capture-recapture models for estimating the abundance of naturally-occurring defects. In: Proceedings of ESEM ’08. Kaiserslautern, Germany: ACM, pp. 158–167.
Walia, G. S., Carver, J. C., Nagappan, N., 2008. The effect of the number of inspectors on the defect estimates produced by capture-recapture models. In: Proceedings of ICSE ’08. Leipzig, Germany: ACM, pp. 331–340.
Whiteside J, Bennett J, Holtzblatt K. Usability engineering: Our experience and evolution. In: Helander M, ed. Handbook of Human–Computer Interaction. Amsterdam, Netherlands: North-Holland; 1988:791–817.
Williams G. The Lisa computer system. Byte. 1983;8(2):33–50.
Wilson EB. Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. 1927;22:209–212.
Woolrych, A., Cockton, G., 2001. Why and when five test users aren’t enough. In: Vanderdonckt, J., Blandford, A., Derycke, A. (Eds.), Proceedings of IHM–HCI 2001 Conference, vol. 2. Toulouse, France: Cépadèus Éditions, pp. 105–108.
Wright PC, Monk AF. A cost-effective evaluation method for use by designers. Int. J. Man Mach. Stud. 1991;35:891–912.