Behavioral Prototype: “Dubs,” the health chatbot

Developing + testing a VUI with the Wizard of Oz method

Cameron Wood
5 min readNov 24, 2020

Our challenge this week was to build and test a behavioral prototype for a voice user interface (VUI) or a chatbot. Along with this challenge were several constraints. First, the prototype must simulate an interactive experience between a user and an interface. Also, it must allow for real-time modification of the experience. Finally, it must make efficient use of low cost materials. Before beginning the ideation process, it is also important to understand the testing method we were asked to use for this assignment.

To test the prototype, teams were required to use the Wizard of Oz method. This is a usability prototyping method that excels in simulating advanced functionality by tricking the user into thinking that the system is working automatically. For instance, a team of researchers could ask a user to draw in the air and show them a screen that mimics those motions. The user may think the system is copying their motions, but really there is another researcher mimicking the user and drawing for them. The goal is always to create the illusion of a finished and functional system, but performing all of the actions manually in the background.

After understanding the problem space, and the testing method, we were ready to ideate a solution. We began by discussing whether we wanted to pursue a VUI or chatbot solution. None of us felt that we wanted to build upon the VUI we prototyped the week before, so we decided to explore a chatbot. After discussing various specialities for our chatbot, we decided to go with a healthcare chatbot since it seemed particularly useful for users.

Our goal for the study was to evaluate the desirability of our prototype to determine if it is something worth pursuing. In particular, we wanted to determine how engaging the prototype is, how well the conversation flowed, and whether the user would like to continue to use the product. We would know our test was successful if the user felt that they were engaging with a chatbot and not a human. Also, the test would be successful if there were minimal errors in the conversation.

We tested out our health chatbot using Facebook Messenger.

Prototype

Our health chatbot “Dubs” is an automated messaging system based on Facebook Messenger. Dubs is able to accomplish the following tasks as users’ personal assistant:

  • Provide diagnosis from symptoms
  • Provide lists of symptoms when inquired

On the basis of providing initial assessment after users inputting the symptoms they are experiencing, Dubs will provide follow-up questions and walkthroughs to ensure users are able to gain a diagnosis that is as accurate as possible.

While interacting with Dubs, users are able to know the purpose of Dubs from the greeting “I’m your personal health assistant.” It also prompts users to describe their condition by asking “What can I help you with?” If users prompt “What can you do?” or “Help,” Dubs will provide a list of core tasks available. If users repeatedly give incorrect commands, Dubs will give users the option to hear a list of commands.

Dubs will also tolerate a reasonable amount of flexibility in terms of user input. Dubs will provide responses based on keywords in user input, and will provide a list of commands and hints if input errors happen in a row.

We included template responses in order to prepare for possible interactions.

We also designed a template response that includes an introduction, error states, repeated errors, ambiguous research, and a conclusion. With the template response, our wizard is able to mimic Dubs’s interaction as a bot by providing a standard, template response.

We modeled a script for common user interactions.

To ensure the testing runs smoothly, we wrote up a script that prompts the introduction, task instructions, and follow-up questions. This script included common user interactions with Dubs based on our two functions, so we are able to prepare for the responses of some commonly asked questions by users and consider the possible edge cases.

We created the script based on two interactions involving Covid-19 symptoms

Analysis

To assess the functionality and desirability of our prototype, we ran a test session using the Wizard of Oz method. We conducted the session via Zoom, and used Facebook Messenger for our chatbot. Our team split into three roles: facilitator, videographer, and “wizard.” At the beginning of the session, we informed the participant that he would be using our chatbot, but we didn’t tell him that the chatbot was actually one of our teammates.

We prompted the user with the following scenario: “You are sick and experiencing Covid-like symptoms. Use the chatbot to assess your illness.” The participant sent messages to the chatbot, and our “wizard” copy and pasted messages from our VUI script when responding.

At the end of the session, we asked our participant the following post-test questions:

  • Were you frustrated by anything while using the chatbot? Why?
  • What do you like the most about the chatbot? Why?
  • Would you like to use the chatbot as a product in the future? Why or why not?

The participant expressed little frustration using the chatbot. The tasks ran smoothly, and there were virtually no pauses in between responses. Additionally, we ended up using roughly 90% of our script and none of our error messages. The only mistake that occurred during the test session was an unnecessary message from our “wizard,” but this did not significantly interrupt the flow of the conversation. The participant noted that he appreciated the ease of use, but would likely not use the chatbot in the future because he prefers Googling his symptoms.

The success of our testing session showed us that our script was well thought-out for the included tasks. We anticipated every query our participant sent, and had the proper messages to send back. Although this testing session proved successful, we still have plenty of room for improvement.

Firstly, we could work to make our chatbot more robust. It worked well when assessing Covid symptoms, but did not have much functionality beyond that. To improve our chatbot, we could think of other important interactions, then create scripts and flow diagrams to handle those queries. Secondly, we could conduct more testing sessions in order to further assess the desirability of our chatbot. The participant we tested with didn’t like using chatbots, so we were not really able to evaluate this part of our prototype. Testing with users who have experience using or who regularly use chatbots could give us better insight. If we collected more data pointing to the lack of desirability for our chatbot, then we could pursue making the interactions more helpful or useful.

--

--

Cameron Wood

Seattle-based UX designer & researcher. Finishing up my Human Centered Design & Engineering degree @ UW. Check out cameronawood.com for more!