andrea l spray

"...numbers are important for several reasons. They can confirm (or fail to confirm) key concepts and theories as well as the initial findings of more exploratory field work. They tell you whether the inferences you make are well grounded or off the mark. And sometimes, good hard statistical analysis can do more than confirm what you already largely know: It can point you toward connections or conclusions you would not otherwise have seen." -- Richard Florida, The Rise of the Creative Class

Clarifying Statistical Usability

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Next >

How to Use Usability Statistics


“… it’s hard to show usability, it’s much easier to show un-usability” – Jeff Sauro

So does this mean that statistics are never necessary? No. As you move from an emphasis on high level (task flow) issues to widget design issues the effect of the issues becomes smaller and the need for scientific precision in testing increases. More rigorous testing is also appropriate if your user-base is so large that even a small percentage of failures will impact hordes of users or equate to substantial sums of money. In both cases, the difference between an option A and an option B might be pretty small and difficult to detect with a small sample size.

And of course if your client insists on quantitative justification for changes, conducting statistical analysis will be necessary.

In all of these scenarios, however, you shouldn't be relying on small-sample-size studies anyway and the point is moot. If scientific precision is required, conduct a large-scale usability study. Automated testing software makes this a viable option.

But even with high-level usability testing, statistics can be helpful in tempering your own confidence and/or skepticism of your observed data. Consider this satisfaction-rating example. I gathered satisfaction ratings (on a Likert 1 through 5 scale) for three different designs of the same task. The average rating for design A was 2.2, the average rating for design B was 3.4, and the average rating for design C was 4.2. I looked at those satisfaction ratings and thought that design C was obviously the best option to go with because it received the highest rating. And based on these results I intended to push my client to go with option C despite that it was significantly more expensive and difficult to develop.

However, I ran a test of the statistical significance (a statistical analysis not demonstrated in this tutorial) on those ratings and found that, statistically speaking, there was no difference between design B's 3.4 rating and design C's 4.2 rating. User satisfaction was essentially the same. So what? Well this meant that, as a usability practitioner, I could push less hard for design C because I knew that design B was equally as good. I could support option B, which was a less expensive approach, knowing that it was equally good without having wasted valuable usability capital on something that didn't matter. In real-world usability, where you're forced to pick your battles, this was huge.

Next >

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18