Chatbot Research 101 – the challenges of UX research for conversational AI and how to overcome them

Carla Biegert

September 25th, 2025

Illustration eines Arbeitsplatzes mit Bildschirm Tastatur und Roboter

Chatbots and conversational AIs are on everyone’s minds and can now be found in numerous applications and interfaces. But the question arises: Are conversational AIs really always the best choice?
When users need help completing their tasks and turn to a support bot, many seem to want to be transferred to a human as quickly as possible instead of dealing with the chatbot.

Figure 1: Source: https://www.linkedin.com/posts/patricia-roller-a2047a_chatbot-research-answers-from-this-week-activity-7274889958262243329-fupV

As we saw in our article “The factors that influence the UX of conversational AI”, there are various factors that influence the use and user experience of conversational AI. Now we want to take a look at what happens when we conduct UX research on conversational AI. What challenges might we encounter? How do we need to adapt our research methods to obtain valid and reliable results?

What is Conversational AI?

As a quick reminder:

„Conversational artificial intelligence (AI) encompasses technologies such as chatbots and virtual agents that users can talk to. They use large amounts of data, machine learning, and natural language processing (NLP) to imitate human interactions, recognize speech and text input, and translate their meaning into different languages.“[1]

Good UX is always based on UX research, i.e., the involvement of real users with the aim of identifying their needs, motivations, and frustrations. The fundamental questions of UX research also apply to research on conversational AI:

What is the context of use, i.e., when/how/where is interaction needed?
What are the users’ needs?
What problems may already exist in terms of usage?
What are users’ expectations of a digital product? What motivations underlie these expectations?

However, there are some special features of conversational AI research that we should pay particular attention to.

What are the challenges and differences in UX research for conversational AI?

A key difference between conversational UIs and traditional UIs is that interaction is based on a dialogue-based approach that relies on text and voice interaction. This means, on the one hand, that multimodal forms of interaction are possible (e.g., written text and spoken language), but also that users provide very different input. For example, while user A may simply start speaking and possibly use an imprecise prompt and even filler words, user B may type in a short and concise description of what they want.

Figure 2: own illustration

Language varies and is used differently; inputs differ, for example in terms of language style. Language is also always ambiguous, which can lead to misunderstandings – unlike clicking on a button. Another point that should definitely be considered is that language – both written and spoken – can be a barrier to the use of conversational UIs, especially if it is not tailored to the needs of the target audience.

Although our goals in researching conversational AI remain the same in principle, there are topics and areas that should be added or examined more closely; In short, our focus is expanding to include new aspects. In the aforementioned article by my colleague, it becomes clear which factors have a particular influence on the use of conversational AI. These and other topics are also worth addressing in UX research. Psychological, ethical, and security-related issues are coming into focus.

When conducting interviews and usability tests for conversational AI, it is important to be flexible and respond to unpredictable interactions that differ from those in traditional UIs. We need to be able to anticipate and respond to this. Technology is evolving rapidly, which means that prototypes for usability tests for conversational AI need to be technically advanced, whereas with “conventional” UIs, it is possible to test pure concepts. On the other hand, we need adaptable research methods to keep up with newer developments. Therefore, it is necessary to incorporate or develop new metrics to measure aspects such as trust and the flow of conversation. And technical challenges (in remote research), such as background noise or speech recognition, play an even greater role in conversational AI.

In the following, I would like to present the three classic research methods of interviews, usability tests, and surveys/questionnaires, and what you should pay particular attention to when applying each method to UX research on conversational AI. To illustrate this, I will explain the procedure using an example: our absence assistant bot “Stevie Sloth.” The idea for this arose when our employees repeatedly reported uncertainties about attendance and absence times. Although the information was expected to be known, questions arose that were sometimes uncomfortable to ask. Our goal in developing the bot was to reduce uncertainty about basic questions regarding attendance and absence.

Interviews in the context of Conversational AI

Interviews are a UX research method in which a few carefully selected individuals are interviewed in depth with the aim of gaining a better understanding of the context of use.[2]

Before conducting the interviews, we first need to clarify what we want to find out in the conversations. Key topics here are exploring the mental model of the users, their previous experiences, and identifying user needs. Later, we will decide which user needs should be addressed by the product and whether conversational AI is the right solution at all. In addition to our regular questions about the context of use, there are other things to ask in the context of conversational AI, e.g., special features in language use/prompting, or attitudes toward ethical issues, meaning data and information protection, responsibility and transparency, and resource consumption.

Our absence assistant bot: Interviews

We also conducted UX research for our internal project, an assistant bot on the topic of “absence”: Four new colleagues agreed to participate in interviews in January 2024. We focused the interview guide we created for this purpose on the following three topics:

Previous experiences with ChatGPT and chatbots (and associated attitudes and expectations)
Onboarding experiences at Centigrade and previous employers
Vacation planning and requests (experiences, processes, feelings)

We were able to draw important insights for the development of the assistant bot from these interviews. For example, it was important to the interviewees that they receive the answer to a question the moment it arises, rather than reading the intranet page on vacation requests during onboarding and then forgetting it again. It also became clear that users need individual, role- and topic-specific answers to their questions, taking into account their own work model, working hours, scheduling bottlenecks, and resource planning. Another point that we certainly hadn’t considered beforehand was that users want to decide for themselves who to ask; for some questions, it may be more convenient to ask a chatbot, while for others, it may be preferable to discuss them with a colleague.

Based on the interviews, we derived user needs and thus focused on three points for the design of the assistant bot: receiving support and reassurance when questions arise, receiving binding information (e.g., on personal deadlines) as a priority, and applying for vacation independently, autonomously, and in a targeted and communicative manner.

Figure 3: own figure; screenshot from LeanScope AI

As mentioned above, these results led to the creation of our new “colleague” Stevie Sloth. Stevie Sloth has a clear roadmap with active support and/or help for self-help. He offers transparency and expectation management for the next steps, users can choose for themselves how much support they need, and he sends sources for further reading. During development, the focus was on “taking turns,” meaning that the bot asks questions, inquires how much support the user needs (short or long answer), responds to questions itself, and offers information such as “You may also be interested in.” This gets users more involved, actively provides them with faster support, and reinforces learning.

Usability Tests in the context of Conversational AI

In usability tests, representative users perform selected tasks in an interactive system with the aim of analyzing problems and measuring effectiveness, efficiency, and satisfaction. 2

In order to conduct usability tests, we first need to define who should participate in the tests. To do this, screening criteria for test subjects are adapted to the defined user group and then a decision is made as to which use cases should be tested. These are then prepared in a test scenario with a pre-interview, the main part with the tasks, and a post-interview. In usability tests for conversational AI, tasks should definitely be predefined, but ideally in such a way that the prompting can be left entirely to the test subjects.

During the test, the think-aloud protocol is often used, which means that participants are asked at the beginning of the test to think aloud, i.e., to verbalize their thoughts as they navigate through the user interface. With conversational AI, it is important to allow for flexibility, as many different interaction paths are possible, and to refrain from giving any prompts. If necessary, specific (follow-up) questions should be asked to uncover the underlying mental model. During observation, the focus should be on the behavior and reactions of the test subjects and any usability problems that arise, but particular attention should also be paid to the linguistic expression of the test subjects in their communication with the conversational AI.

Wizard-of-Oz-method as an alternative to classic Usability testing

This method describes a test procedure in which test subjects interact with an interface that appears to be autonomous but is (wholly or partially) controlled by a human being.[3]

The method is a modification of the classic usability test and requires less technical effort, as another person controls the interface, in this case the conversational AI, and provides the answers. It thus offers the opportunity to gather insights into interaction with conversational AI early on in product development and to record in great detail what expectations users have, how they formulate prompts, and, above all, how they react to different outputs.

Our absence assistant bot: Usability tests

The first version of Stevie Sloth was also tested directly with usability tests, in which the test subjects were asked to complete two specific tasks:

„Please talk to Stevie to find out how you can apply for your dream vacation.”
„Please give Stevie an inappropriate/unfriendly answer or ask him an inappropriate question.“

Observations, further questions about the perceived user experience, and screenshots showing how test subjects wrote prompts provided us with valuable insights that we could then use to improve the assistant bot. If you would like to learn more about Stevie Sloth and other assistant bot projects, feel free to read my colleague Sarah’s blog article.

Questionnaires

Questionnaires allow for a less detailed but more widely distributed collection of data, facts, and opinions.

The UEQ (User Experience Questionnaire) [4] is often used to measure the UX of user interfaces because it validly captures several facets of usability and user experience: attractiveness, transparency, efficiency, controllability, stimulation, and originality. From Alessandra’s article, we know that there are other factors in conversational AI that influence usage and the user experience. Questionnaires are a useful tool for finding out how the various factors are perceived and what role they play in a defined user group.

The UEQ can be expanded modularly as UEQ+[5] with questions that are relevant to conversational AI, such as whether and how accurately instructions are understood, to what extent the AI is trusted and considered secure, reputable, and transparent, or whether the response behavior is perceived as natural and appropriate.

However, it is also possible to use specific questionnaires, such as the “Semantic Differential Scale for AI Trust,” which measures cognitive and affective trust using various subscales and semantic differentials. There are now even questionnaires that specifically aim to measure the UX of conversational AI, e.g., CASUX[6], in which the authors also take anthropomorphism into account by measuring the “humanity” of conversational AI.

Conclusion

UX research on conversational AI presents challenges that you should be prepared for, but which can be easily overcome with minor adjustments, especially in the preparation phase. It is particularly important to consider issues such as trust and data protection, and to maintain a certain degree of flexibility, especially when conducting usability tests.

Above all, however, it is essential to conduct UX research in the first place, because this may of course lead to the conclusion that conversational AI is not the right solution to meet existing user needs. Conversational AI is not a universal solution.

Sources

[1] What is conversational AI? (2021, September 28). IBM. https://www.ibm.com/think/topics/conversational-ai

[2] UXQB e.V. (2023). CPUX-F Curriculum Version 4.01. UXQB. Abgerufen am 06.August 2025, von https://uxqb.org/public/documents/CPUX-F_DE_Curriculum-und-Glossar.pdf

[3] Paul, S., & Rosala, M. (2024, April 19). The Wizard of Oz Method in UX. Nielsen Norman Group. https://www.nngroup.com/articles/wizard-of-oz/

[4] Laugwitz, B., Held, T., & Schrepp, M. (2008, November). Construction and evaluation of a user experience questionnaire. In Symposium of the Austrian HCI and usability engineering group (pp. 63-76). Berlin, Heidelberg: Springer Berlin Heidelberg.

[5] Schrepp, M. (2021, October). Measuring user experience with modular questionnaires. In 2021 International Conference on Advanced Computer Science and Information Systems (ICACSIS) (pp. 1-6). IEEE.

[6] Faruk, L. I. D., Pal, D., Funilkul, S., Perumal, T., & Mongkolnam, P. (2024). Introducing CASUX: A Standardized Scale for Measuring the User Experience of Artificial Intelligence Based Conversational Agents. International Journal of Human–Computer Interaction, 1–25. https://doi.org/10.1080/10447318.2024.2359206

Blog