1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
There is a hierarchy of usability issues that begins with task flow and ends with widget interaction. If the task flow doesn't work for the end user, it doesn't matter how well the widgets are designed. The user is still going to struggle. Consequently, the most crucial insights are the result of usability findings near the top of that hierarchy. And because issues at the top of the hierarchy are so fundamental to the task, their effect is big enough that you don't need statistical analysis to know there's a problem. The observed data - and corresponding anecdotes - are enough. You often hear of this in terms of "the big a-ha!"
It's clear that usability testing is a poor method for demonstrating that a design is usable. The appropriate use of usability testing is, instead, to:
In neither case is the type of statistical analysis we've conducted in this tutorial required. The Confidence Interval may help put the observed Success Probability in perspective to create a more nuanced conclusion, but it won't make the results any more definitive. Again, this observed data in all its mushiness is adequate to make such comparative judgments.