andrea l spray

Clarifying Statistical Usability

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Next >

Wow! That’s a Wide Confidence Interval!


We started this example with 9 out of 10 participants passing the scenario. In my experience, this is indicative of a fairly effective design. But the best we can say, at the end of our statistical analysis, is that somewhere between 57.5% and 100% of real end-users will be able to complete the task. And there's even a 5% chance that the success rate will be less than 57.5%!

So while we've demonstrated that we can calculate accurate statistics from usability data, what we gain in accuracy we lose in punch. Its much more satisfying to say "90% of users will successfully complete this task" than "somewhere between 57.5% and 100% of users will successfully complete this task". Given that usability testing is typically conducted with less than 20 participants, even effective designs will have worrisome Confidence Intervals.

James Lewis, collaborator with Jeff Sauro on MeasuringUsability.com, puts it this way:

“’[Binomial Confidence Intervals] cannot be used with a small sample to prove that a success rate is acceptably high. With small samples even if the observed defect percentage is 0 or close to 0, the interval will be wide, so it will include defect percentages that are unacceptable. Therefore it is relatively easy to prove (requires a small sample) that a product is unacceptable, but it is difficult to prove (requires a large sample) that a product is acceptable.”

So now we come full circle. First, if usability data is so mushy why does it work so well? And second, is there any use for statistics in usability at all?

Next >

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18