Speech Improvement Voice User Interface Design: VocaPal
Overview
VocaPal is a Voice User Interface (VUI) that provides vocal exercise sessions to users who wish to prepare or improve their speaking voice/techniques.
Purpose
1. Address the anxiety of public speaking
2. Build a prototype to help enhance the user’s speaking voice
My Role
User Research
Usability Testing
Flow Charts
Personality Design
Intents
Sample Scripts
Prototyping
Tools
Lucidchart
Voiceflow
Google Forms
Project Type
Conversational UX Design
Team
Myself
Period
Oct 2021 - Dec 2021
Concept
VocaPal is a voice user interface (VUI) prototype for a potential multi-modal application, created to address the intimidation and anxiety of public speaking.
Effective verbal communication depends on how a message is conveyed through one’s voice. Whether it is making regular presentations in front of a group or just talking to family and friends, there are plenty of situations where good speaking skills and voice control can help a person advance in their career or create opportunities. Despite the fact that most everyday interactions involve speaking, there are many people who feel uncomfortable, intimidated, and anxious when it comes to certain public speaking tasks. Fortunately, there are techniques that anyone can learn to speak more effectively, and make meetings, presentations, and speeches less daunting.
Catering to anyone who wishes to improve their speaking voice, VocaPal provides voicebot-led vocal exercise sessions to help enhance the user’s voice and delivery across all communication contexts.
Using voice input, the VocaPal prototype can achieve the following key interactions:
Research
User Survey
A user survey was conducted to understand existing voice preparation behaviors before a speaking scenario. Besides asking participants how they generally practice speaking, the survey also focused on finding the emotional burden of speaking in front of other people through levels of confidence.
To target participants who regularly speak in front of other people or are interested in public speaking, the survey was sent to the Pratt Institute Information Experience Design Program’s students and faculty who frequently do project presentations, Reddit’s r/PublicSpeaking online community, as well as individuals known to have professions requiring public speaking through Facebook and LinkedIn posts. There were a total of 30 responses, with the majority of the participants (66.7%) being between 25 to 34 years old.
Although the level of comfortability in speaking in front of others varied, the survey found that most participants (66.7%) habitually practiced using their voice before a speaking situation, the top reasons being:
To sound more confident (72.7%)
To calm down or reduce stress (72.7%)
To produce a clear voice (68.2%)
To warm-up (50%)
To practice pronunciation (27.3%)
The most common voice preparation method used by the participants was to say out loud a script for a presentation or meeting. However, many participants were also interested in trying or learning more about other commonly used voice training methods.
The popular methods alongside script reading were:
Breathing exercises (43.3%)
Vocal warm-up exercises (26.7%)
Pronunciation exercises (26.7%)
Competitive Analysis
4 competitor mobile applications that offer exercises for speech improvement were selected based on their high ranking and recommendations in the Google Play Store.
The applications were analyzed based on the following properties speculated to have a crucial impact on a voice exercise guide conversational interface: Voice Lesson Format, Level of Personalization, Navigation, and GUI appearance. The competitive analysis revealed that there was not yet an application that had a completely voice-interaction service. The best practices and solutions to solve common weaknesses found from the analysis were compiled into the following list of instructions to design VocaPal into an advanced vocal exercise guide interface:
To create a user-friendly voice lesson format:
Recommend a series of vocal exercises that best serves the user based on an assessment of what the user wants to improve about their voice
Make the time duration of individual exercises transparent
Provide a variety (more than one) of vocal exercises in a lesson
To personalize the exercise experience:
Allow users to set reminders for a routine practice
Register the user’s name and remember what the user wants to improve on to form personalized lesson plans
To make the interface easy to navigate:
Provide exit ways or the option to pause in the lesson, and do not put a negative connotation when the user leaves a lesson
Give an overview of the lesson plan, or allow the user to freely skip parts of the lesson. This will give more control to the user and not force them to complete exercises they do not feel comfortable doing
For a useful GUI Appearance:
Provide visual cues when it is time for the user to speak
Avoid providing too much content (text, menu options, etc.), or break the content into chunks if it will be long/overwhelming for a user to process
Provide illustrations demonstrating the exercises when appropriate, along with the vocal or textual instructions
Flow Chart
A flow diagram was made to understand what path the conversation with VocaPal can take, and map out all the different possibilities with the correct intents and prompts. Overall, VocaPal’s conversation can go through and between the following phases: Introduction, Settings, and Practice Phase.
Bot Persona
Speech improvement and voice practice can put people in a vulnerable position as it can sometimes produce stressful and emotional responses. To guide users with various levels of confidence in their voice through specialized exercises, it was determined that VocaPal should have the personality of a casual friend who is knowledgeable about vocal exercises, encouraging, and willing to help the user improve their speaking voice. The following personality specifications were established to guide the sample scripting process.
Interaction Goals
i. Guiding / Instructive
The voicebot should be knowledgeable about different voice exercises, and act as a reliable guide that clearly explains how to perform each vocal exercise it recommends to the users. The reliability will also provide a level of calmness that can help reduce the user’s anxiety.
ii. Fun
Being fun will help alleviate the stress that the users will be feeling before they have to speak publicly, and make the series of exercises easier for the users to complete.
iii. Flexible
The voicebot can recommend voice exercises based on what the user wishes to improve about their voice, as well as personalize the amount of time an exercise session will take.
Level of Personification
VocaPal should have a medium level of personification, which is enough for it form a familiar relationship with the user while still being able to provide accurate knowledge about the vocal exercises.
Power Dynamics
What is the power dynamic in the conversation?
VocaPal and the user will have a close to equal relationship, because the system will have a personality similar to that of a casual friend who is easy to speak with when feeling stressful, emotional, or vulnerable. The voicebot will take some leadership when it recommends exercises based on the user’s goal, whereas the user will have the power to complete the recommended exercises and ask for a set of exercises for a specific amount of time.
How intimate does this relationship need to be?
The relationship does not need to be too intimate. The conversational interface should be approachable and casual enough for the user train their voice and overcome speaking anxiety.
How will their relationship change over time?
The relationship will become personalized after the first use. The interface will remember the user’s practice reminders, and what type of exercise they need to achieve their goal, making VocaPal a reliable guide.
Tone
*Based on the 4 dimensions of voice from Conversation with Things.
Character Traits
The user survey confirmed that many people feel anxiety and stress to some extent before speaking in front of others. It is assumed that this feeling is multiplied now because of the current situation of people returning to the in-person work environment after a long time of being forced to work alone from home.
Creating an encouraging, supportive, and confirming interface that can accompany a user and propel their level of confidence before a meeting, presentation, or speech can help alleviate some stress and create a positive speaking experience.
Prototyping
Sample Scripts
Along with a training data of prompts and expected utterances, 6 Sample scripts were made with the personality specifications in mind to get a sense of the paths the users could take using VocaPal.
Audio Demo Prototyping
A low-fidelity prototype consisting of a series of audio demos that acted out the initial sample scripts was made to understand the user’s first impressions of VocaPal’s personality, exercise usability, conversation flow, script clarity, and anticipated challenges. The demos were made by connecting audio clips of prompts read by the machine-generated voice on Voiceflow and the user's voice acted by myself.
Usability Testing Round 1
Moderated remote usability tests were conducted over Zoom with 3 participants who answered they frequently speak in front of others. Each person was asked to listen to sections of the audio demo and speak out loud their thoughts.
There was primarily feedback on the voicebot’s robotic voice, unnatural way of talking, and speech rate, and lack of visual cues or instruction.
Voiceflow Interactive Prototyping & Iterations
For the next iteration, the initial sample scripts and flow diagram were updated based on the findings from the first round of usability testing. A high-fidelity interactive VUI prototype that incorporated the following opportunity areas was made with the Voiceflow tool.
Usability Testing Round 2
The second round of usability testing was conducted with the Interactive Prototype in a similar fashion as the first testing, but with three different participants.
There was the same dissatisfaction of the robotic and unnatural sounding voice as found in the first testing. There were also new feedback on the speech rate, various misinterpreted utterances, monotonous list of reading tips, and slow loading time for images and gifs.
Iterations Round 2
All of the following proposed solutions from the second usability testing, were followed to refine the interactive Voiceflow prototype.
Although a measure to fix the robotic and unnatural sounding voice was proposed (by replacing VocaPal’s voice with a real human voice), due to the constraint of time and resources this solution was not implemented. The training data, prompts, sample scripts, and flow diagram were also all updated accordingly.
Getting Feedback from a Conversational UX Design Expert
A video demonstration of the final VocaPal prototype was shown to several Conversational UX Design Experts.
Mr. Tonye, a software engineer who has worked on conversational AI and chatbot design projects, gave me a tip to confirm the learnability and progress of the users during their reading tips lesson, which was to give quizzes on the covered tips.
Conclusion
Design to help speaking
This project gave me the opportunity to revisit the difficulty of speaking in front of others and understand various speech improvement exercises.
Solo work
This was a solo project where I was responsible for all of the design processes, making me feel more independent and confident about my decision making skills at the end of the project.
Conversational UX Design Lesson
This was my first conversational UX design project that gave me the experience to adapt to and embrace designing with new tools/techniques such as Voiceflow prototyping, Lucid flow charting, and sample scripting.
Reflection
There were a couple of limitations encountered during the design process, which will need to be addressed if VocaPal becomes a real product.
Few usability testing with non-native English speakers
Only one participant out of the six recruited for the usability testings was a non-native English speaker, and her test session provided valuable insight into the racial/first-language bias of VocaPal. It was observed that VocaPal mistook this participant’s utterances more because of her unique accent. Conducting more usability tests with participants of various accents or language backgrounds will produce word error rates that can help identify areas of bias inside VocaPal.
Reduced engagement and adherence of participants
During the session, participants were asked to think aloud, but some were too focused on the task to explore other areas of VocaPal on their own. Some participants also forgot to talk aloud as they performed the tasks, requiring the moderator (me) to probe them about how they feel and behave, while also trying not to encourage or disturb them to finish a task.
Next Steps
I would like to incorporate the following insights and feedback I gained from my last usability testing and presentation.
Replace the machine generated voice with a human voice
To prevent users from picking up unnatural ways of speaking, it would be better to have the instructions and demonstrations read by a real person, preferably a voice trainer.
Add and test a text input feature
Initially, the pronunciation exercise was supposed to include an interaction where the user types in a word they have trouble pronouncing for VocaPal to say out loud to them. Unfortunately, this could not be tested because there was no text input option for a Voiceflow voice project.
Add a speech rate adjustor for the user
The tests revealed that the speed of speech can be perceived differently depending on the user. The speech rate that seemed normal for most participants was not for those who wanted to listen carefully or needed time comprehending because English was not their first language. A feature that gives the user power to adjust the speed of how fast VocaPal talks would be helpful.
Provide reading tip quizzes
As recommended by Mr. Tonye.