Sample Size in Usability Tests and User Interviews – Less is More?

Marvin Mader

January 26th, 2023

User research in the form of usability tests or user interviews has become an indispensable part of the UX world, especially when it comes to developing or improving a product. Developers gain insight into what users really need and what potential problems there may be with the product. A key point of research is the selection of the right sample. If the sample is not meaningful or representative for the defined user group, the validity of the findings and thus the quality of the product will suffer.

What is a sample again?

A sample is a part of a population. In the UX area, this means all persons who belong to a certain user group and later use the product. The population of all users of a product is made up of the various user groups. A user group is represented by a persona, an archetypal user. If you want to make a statement about a persona without interviewing every single person from a user group, you draw a sample. In order for the sample to be meaningful, two conditions must be met above all:

The sample must be large enough to make valid inferences and predictions.
The sample must be representative, i.e. it should reflect the user group as well as possible at all levels.

To make a sample as representative as possible, care should be taken to include people from all levels of the population.

Figure 1: cloudfront.net/wp-content/uploads/2022/02/WP_Stichprobe-1024×576.jpg

The example in the figure would therefore only be representative to a limited extent, as the people marked in red make up the majority in the population but are not included in the sample at all.

However, the figure cannot be transferred one-to-one to the user-centric approach in the UX area, because the sample here is drawn from a concrete user group. Since a user group is usually represented by a persona with certain motivations, frustrations, and characteristics, an attempt should be made to represent the project-relevant aspects of the persons in particular.

But how many test persons do you actually need to evaluate a product in a meaningful way?

At first glance, one might assume that with more and more test subjects, more meaningful results can be achieved. But why is it, especially in user research, that often only 4-5 people are used in usability tests or interviews? And why, on the other hand, are sometimes several hundred participants recruited in scientific studies? Is it possible at all to make well-founded statements about a newly developed product, which is supposed to serve a larger user group later on, with this handful of testers in user research?

The scientific perspective of sample selection

Psychological studies in particular do not simply test blindly. The required number of test participants is calculated in advance, and then the number of persons surveyed is increased until the desired number is reached. During the survey, representativeness must always be kept in mind so that no distortions occur. Especially the “sampling bias” can cause problems. This and other biases in user studies will be discussed in more detail in one of the following blog articles.

But how do you know how many participants are needed? The required number is called “optimal sample size” and is calculated statistically. I will spare you the calculation here. In any case, the free software program “G*Power” is usually used for this purpose.

The optimal sample size fulfills two conditions:

First, the sample size is so large that the expected effect can be statistically validated. This means that the effect can be detected with minimal expenditure of time, money and test persons, should it actually be present in the population.
Second, the sample is small enough that effects smaller than the expected effect do not become statistically significant.

Thus, the optimal sample size represents a compromise between practical relevance (cost and effort) and statistical significance (the expected effect is found). The prior calculation of the required individuals mainly saves resources and should nevertheless ensure that the expected effect is found. Especially this economic factor plays a major role in defining the sample size for usability tests and user interviews, since most projects have a limited budget.

Costs vs. benefits

The “Return on Investment” or short “ROI” is a key figure that shows the profit of an activity in relation to its costs. In the example of user tests or interviews, this means that each person recruited and tested first costs a company money (costs). At the same time, useful information about a product is generated (benefit). If more and more people are now tested with the same test material, the costs add up, while the newly acquired information becomes less. The ROI decreases. The Nielsen-Norman-Group, the world’s leading research group in the field of usability and user experience, has made this clear using the example of user problems found in a usability test.

Number of testuser vs usability problems found

Figure 2: www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/

With the first person you get the most new insights
The second person reveals further problems, but in some cases already known problems are also picked up on
The more persons are tested, the less new information is generated from the tests
Between six and 15 test persons, there is only a very small increase in the number of problems found

Therefore, it must be carefully weighed up how many test persons are used for a project in order not to exceed the budget for user research, but at the same time to obtain all important information for the development. But what is the best way to weigh how many people to test?

An obvious solution to this problem would be to simply follow the lead of science and calculate an optimal sample size for the project at hand. But why does this find almost no use in practice?

In scientific, especially psychological studies, everything revolves around statistics. The goal is to show significant results and large effect sizes. For this very reason, it is also necessary to determine the optimal sample size. In user research, it is not so much about achieving the highest possible effect size in the use of a product or finding significant differences between two prototypes. Often it is about the qualitative characteristics of a product and the usability problems that can occur. So the question to ask in user research is the following:

What is the minimum number of people I need to interview/test in order to develop a product that appeals to the target group and that can be used without problems?

For this question, the answer is not the optimal sample size from scientific studies.

So how big is big enough?

The decisive answer is that it depends on the methodology used.

The Nielsen-Norman-Group already tried to find an answer to this question in the early 2000s and has since come up with different results for different methods of user research. In the following, we will take a closer look at usability tests and interviews.

Optimal sample in usability tests

In most usability tests an optimal sample size of 5 +/- 2 is used. This number is based on the recommendations of the research group. The researchers argue with the number of usability problems that are uncovered per test person (see previous figure).

As you can see in the graph, with a sample size of only 5 people, you uncover about 75% of the usability problems in a product. If you test more than 5 people, on average the costs outweigh the little extra information. The ROI would therefore be too low. The recommendation is therefore to conduct a usability test with a sample size of about 5 people and to include as many usability tests as the budget allows in the course of product development, so that the product is continuously improved.

Optimal sample in user interviews

User interviews are a little different. Here, there is not just one number to stick to. Nevertheless, a similar picture to the usability tests emerges here as well. More people bring more results, but only up to a certain point.

Figure 3: www.nngroup.com/articles/interview-sample-size/

Interviews are often about the qualitative experiences of users, which is why the number of people needed depends on two factors, according to the Nielsen-Norman Group. First, it depends on the question and the scope. If the scope is broad, more people are needed to generate valid results. If the scope is lean, only five people may be sufficient to deliver representative results. Therefore, we at Centigrade try to scope as lean as possible in our projects at the beginning, so that we obtain results that are as reliable as possible for this scope. On the other hand, it depends on the diversity of the users. For more diverse users, more test persons are needed than for a special group of users to ensure representativeness. The recommendation of the Nielsen-Norman-Group is to start small (e.g. with five test persons). If during the tests it becomes apparent that all test subjects generate a lot of new information, the number of test subjects can be gradually increased until sufficient information has been collected.

Conclusion

In summary, it can be stated that in user research it is often sufficient to interview only a small number of about five test subjects. The difference in knowledge gain to a large sample is often only small and the costs are kept low with fewer testers. Especially with lean scopes it is worthwhile to test fewer persons. Nevertheless, the representativeness should always be kept in mind when selecting the sample, and if the question or the method requires it, more test persons should be used.