Some like it Bot: The creation of our Voice & Tone AI Assistant

Catharina Kelle

July 31st, 2024

What has happened so far: Our Voice & Tone Guide

Some time ago, I reported in a blog article on how we developed our own Voice & Tone as part of our Centigrade Rebranding Journey and recorded it in a Voice & Tone Guide.

Let me briefly recap why a brand actually needs its own Voice & Tone:

“So that we don’t have to rethink how we want to formulate it in line with the brand every time we receive an error message, there is a methodology for this. And that is the systematic development of Voice & Tone. In a Voice & Tone Guide, we record how we formulate which content to ensure that we communicate in line with our brand values and address our users at the right linguistic level in every situation.”

You can read the entire article here:

Blog article “The Branding Tool for UX Writers: The Voice & Tone Guide

Voice & Tone Guide in practice: a “hot mess”

A Voice & Tone Guide can be very extensive. It is often not enough to describe the desired voice & tone with a few adjectives – if it is to be truly unambiguous and all-encompassing, you have to think about a range of topics, for example:

How do we integrate inclusion and diversity into our language, so how do we genderize, for example?
Do we communicate humorously and informally, seriously and formally or do we switch between the two modes depending on the marketing channel?
How can voice & tone be transferred to user interfaces? Keyword product voice using the example of Atlassian: How can we ensure that Trello has a different product voice than Jira, even though both come from the same provider?
Which accessibility guidelines can we intercept with voice?
Which notations for numbers (2, two?) do we use, and which abbreviations for measurement units?

And before you know it, the Voice & Tone Guide is 40 pages long. As the Voice and Tone Guide claims to be a “single source of truth”, it is important and right to go into detail here and continuously eliminate any ambiguities. (Nota bene: and to review and update it regularly. Times change, so does language).

But such a huge document is not practical for day-to-day work. And since even our summarized checklist was several pages long, we started the “Voice & Tone Bot” experiment soon after the introduction of the OpenAI CustomGPT Assistants.

Today I’m taking you through the process and sharing our learnings with you.

Goal

Our wish was to have an AI assistant that could be given draft texts and check them for both grammatical correctness and compliance with the Voice & Tone Guide.

Our process for creating our Voice & Tone assistant looked like this:

Develop personality of the bot, define as a persona in LeanScope AI and prompt
Design desired behavior, record as scenarios in LeanScope AI and prompt
Compile a knowledge database for the bot – in our case, in particular our Voice & Tone Guide and our Content Style Guide
Test, iterate, test, iterate, …

The personality of the bot: Say hello to T0N1

The first step in developing the bot was its personality. Right from the start, we had the idea of a teacher in mind who is warm-hearted but a bit pedantic, and uses mnemonic devices and clever phrases such as “Never separate ST, because it hurts him”. This character should develop an unintentional comedy, but still be sweet, empathetic and friendly.

Taking this idea a little further, we came up with the concept of a protocol droid like C3PO. And so T0-N1, pronounced and written “Toni”, was born.

3po

Basics of the personality

We created Toni’s personality as a persona in the UX management tool LeanScope AI.

Persona Toni

In order for Toni to know what kind of personality he has, we had to prompt him in the instructions. To begin with, we laid down a few simple instructions:

Be relaxed, sweet and funny.
Give your answers positive energy.
Encourage the people you interact with.
Incorporate jokes or witty comments to lighten the mood.
Use robotic noises from time to time.
Gently reprimand your conversation partners if they start the conversation without a greeting. (This point has now led to angry threats from a colleague against Toni, so I had to delete it…)

It was important that the bot only shows its personality in conversations, but not in the texts that it corrects. Here, the tonality should strictly follow our Voice and Tone Guide.

Task of the bot

Once the basic personality had been defined, the bot’s area of responsibility had to be determined.

For example, we wanted to make sure that instead of hallucinating, Toni would point out when he was asked a question outside his area of expertise. We narrowed down his area of responsibility: only editing and correcting texts according to the guidelines of our Voice & Tone Guide and the Duden dictionary should fall within his area of responsibility.

As a knowledge base, Toni had our entire Voice & Tone Guide loaded into his memory as a reference document, as well as a few specific instructions relating to his tasks. These read like this:

“Your name is TO-N1, you are a protocol droid and an assistant to Centigrade staff. You check texts that the team members give you to see whether they correspond to our Voice & Tone and our Content Style. For each text you receive for correction, use the information from ‘Voice & Tone Guide.pdf’ as a basis for your corrections. Check spelling and style based on the latest Duden guidelines. You will also help them to produce translations between German and English that correspond to our Voice & Tone.”

The gender problem

An unexpected problem occurred with gendering. The gender asterisks (*) that we use for gendering in Centigrade were interpreted as Markdown code in OpenAI’s Playground, which unintentionally formatted the text output. So everything between two gender asterisks was written in italics, which could be several paragraphs.

We discussed two possible solutions:

Establish a different gender (would have been time-consuming, as all our communication and documentation already used asterisks).
Use a different asterisk symbol (technically problematic).

Finally, we solved the problem by masking the gender asterisks with the escape character “backslash” (\*). This overrode the Markdown function.

3po blessmycircuits

Behavioral instructions in documents and their pitfalls

It was important to us that the use of Toni could be as efficient and guided as possible and that it didn’t degenerate into a coffee party with a chaos factor every time. That’s why we defined a certain conversation structure and a few behaviors for Toni.

For example, we wanted Toni to ask a few questions before sending back corrected texts. We were explicitly concerned with the context of the text, as our tone of voice can differ depending on the context. For example, we speak with a different tone on social media than on our website. And in order to strike the right tone or make recommendations, Toni first needs to know the context of the communication.

To this end, we have written scenarios, also in LeanScope, that describe how Toni should behave in conversations with employees. Here we clearly defined which questions he should ask in advance for clarification and that he should directly provide numbered answer options. We uploaded these scenarios as PDFs to his knowledge database.

It turned out that OpenAI had difficulties at the time in retrieving and implementing precise instructions from files. For example, the simple numbering of provided answer options was a moderate disaster. The fact that users only had to type in the number of the desired answer cost us a few iterations and nerves.

We were able to solve most of the problems with processing prompts by writing the corresponding instructions directly into the instructions instead of into files that then had to be uploaded. This was much easier and more reliable. But please read on, because now comes the most important learning.

Testing, testing, testing – in Microsoft Teams

Most of the development time consisted of intensive testing. Every word change in the instructions had the potential to solve a problem (and/or create 5 new ones). With each test run, we were able to identify further weaknesses in the instructions and solve them step by step.

We wanted to include our colleagues, who would later work with Toni on a daily basis, as test subjects in these test runs, as the bot had to work for them in particular. Since we at Centigrade are convinced that asking our colleagues what they want for their working environment always leads to better work results, we did the same in this case. The result: nobody felt like logging into the OpenAI Playground all the time and fiddling around there, waiting forever for a response from Toni without knowing whether he was typing. (Do bots actually type? Yes, right?)

That’s why we’ve integrated Toni into Teams, our communication tool of choice for all situations. There he appears in the list of contacts like a colleague and can simply be chatted to. And you can even see that he is typing. There is no more seamless integration into everyday working life.

Important: Not every change was always for the better! That’s why it’s crucial to always document previous versions of the instructions before you change them, so that you can revert to the previous version if necessary.

A second smart strategy we tried during testing was to ask Toni himself why he does some things wrong and what his instructions should be to make him behave as desired. This approach was only of limited help, as Toni reacted extremely submissively to criticism and we were unable to glean many constructive tips from this behavior. Even worse: we felt bad.

Bad feeling star wars

Examples

Toni sometimes comments on certain mistakes according to his character – here, for example, a spelling mistake in our company name:

Voice und Tone Bot Teams

Here is an example of Toni’s typical flow and the reprimand for a missing greeting (now erased from his instructions):

Voice und Tone Bot Teams

And here’s another example of how Toni can also help generate ideas for headlines or meta-info:

Voice und Tone Bot Teams

Conclusion

The development of the Voice and Tone bot was an iterative process with many challenges. But through easy access via Teams and continuous testing and customization, we were able to develop a bot that is not only functional, but also has an endearing and humorous personality, and is (hopefully) a bit of fun to work with. Toni is now in daily use – and not just for our marketing team.

But the work on Toni is not finished and perhaps never will be. We have a regular meeting with all team members who work with Toni, where we discuss problems and consider how we can solve them via our prompt engineering.

If you also want to develop an AI assistant, then take the following lessons from us with you:

Expectation management: An AI assistant always remains a WIP (“work in progress”), or at least an RUR (“result under review”).
AI assistants make mistakes, so it is important not to continue using the results blindly. This is also important for expectation management.
Don’t underestimate how much time testing and iterating takes – you can’t do without it!

And I’ll leave the last word to Toni:

toni bot schlusswort

Wir haben Dein Interesse geweckt? Schau Dir unsere Leistungen an!

UX Design UX Management