We all know how crucial the Product-Market fit is to the viability, let alone the success, of a product. Build a product that is a natural fit for tween girls and market it to busy moms, and you will most probably end up with a failed product, no matter how sweet the product or how slick and well financed the marketing push.
In fact, the number one mistake that startups make is neglecting this basic very first step: They start building something with a vague notion, if any, of who the ideal target user is, and then delve into the fun work of ideating and building features, and creating lots of cool bells and whistles, and then launch to the world at large, expecting it to embrace their beautiful baby. Often, they neglect to delve into the fundamental questions of: who is the target user? What problem are we solving for such users (or what additional value are we bringing to their life)? And how are we going to monetize the value that we are delivering for them, and do so to the extent that will enable us to survive and thrive as a company (i.e., price the product so that the company is profitable).
A parallel mistake that is often made by builders of voice experiences is building a voicebot without first asking the basic question: “Is the voicebot a good fit for the use case?” Instead, many builders delve into the hard work of designing and coding up their voicebot, laboring under the unspoken assumption that given any use case, if a GUI experience exists for that use case (a website, a text based chat bot, a visual-tactile mobile app), then an experience can and should be built for voice. That is, one should go ahead and build “the equivalent/parallel” voice version of the other non-voice experiences.
In the previous chapter, we touched on “The Three Characteristics of Conversational Voice”: Time linearity, Uni-directionality, and Invisibility. We argued that these dimensions are unique to the voice user interface and that they could present a challenge to designers who build graphical/tactile interfaces if they don’t keep them in mind. In fact, many novice voicebot designers view these characteristics as weaknesses, or challenges, and point to them as the main reasons why voice user interfaces are unpleasant and unpopular among users.
We believe that such a stand betrays a fundamental flaw in thinking: interfaces cannot be fully described by simply enumerating a list of properties (they are time linear, unidirectional, ephemeral). These interfaces need to also be described in terms of what users can do with them. In fact, to really get to the heart of the matter, we need to talk about not only the properties of these interfaces and what users can do with them, but, and perhaps most importantly, what users want to do with them -- their intentions -- while using those interfaces. It is only when we have identified a specific “use case” -- a situation with a user who comes to the situation with a set of situational attributes (for instance, they are preparing food, they are typing, they are laying in bed with their eyes closed) and a set of intentions (they wish to memorize facts, they wish to listen to music, or hear the latest quotes for certain stocks they care about, or turn off the lights) that we can identify how well the voice user interface will perform as a tool that may or may not enable the user to complete their task.
Here’s a use case to illustrate what we mean.
When we are trying to memorize something (say a list of facts, or a poem, a proverb), we usually do it in a repetitive back and forth, where we speak or mouth what we are trying to memorize and get feedback as to whether we are correct or not and hope that, in the next try, we will get it right. Ideally, we are working with someone else who is asking us questions while we pace back and forth, giving answers and receiving feedback, and then moving on to the next one. Ideally, the back and forth is time constrained (answer me quickly), we are moving on to the next question linearly, and our eyes are closed (which we naturally often do when we are memorizing). In other words, the “interface” is time linear, unidirectional, and invisible.
The voice first conversational user interface, being time linearity, uni-directionality, and Invisibile, would be in such a use case an effective tool for enabling the user to fulfill their want (to memorize something).
In this case, the voice first conversational user interface is powerful because it is temporal, invisible -- in a word, ephemeral; because it demands the user’s attention, requires that they speak up, that they be constantly present and not wander off, that they engage in a focused way. If one wants an interface that lets the user wander off once in a while, or not pay close attention to what is asked of them, or not respond quickly but take their time, then the conversational voice first interface is not a fit, and no matter how talented the VUI designer may be, or how many usability magic tricks they may pull out their hat, they experience will be poor.Good examples of use cases that do not fit the conversational voice interface are booking a trip, following the steps of a cooking recipe, finding out what movies are playing in the mall, or answering a 10 question survey. VUI designers can no doubt craft highly usable VUIs, but give the choice between a tablet and a voicebot, users will select the tablet every time -- unless, of course, they are not able to use the tablet for whatever reason (they can’t see, can’t touch, etc.).
The bottom-line point really is this: if you want to build a voicebot that cannot be beat and you want to deliver truly new value, develop a bias for those use cases where no interface, no matter how rich, can beat the voice-only interface.
Here are 14 basic heuristics that should be of use in at least two ways: (1) They can help you think of use cases where a voicebot will deliver a compelling experience that are superior to one delivered by other interfaces (for instance, visual-tactile mobile apps) or (2) They may help you assess how good a voicebot will be for a given use case. The first is useful to you if you are an entrepreneur and are looking to come up with a business idea. The second is useful to you if you are a product manager or a designer and are looking to understand how much work you will need to do to bridge the gap between the use case and the UI.
Here are the heuristics. The more of these are answered by “Yes,” the more compelling your voicebot will be.
The user is not able to or does not want to use their hands.
The user is not able to or does not want to use their eyes.
The user can easily respond quickly when it is their turn to speak.
It is a desirable thing for the user to be forced to respond quickly when it is their turn to speak.
The user is able to listen carefully to what is being said to them.
It is a desirable thing for the user to be forced to listen carefully to what is being said to them.
The user is able to speak up.
It is a desirable thing for the user to be forced to speak up.
The user can easily enunciate clearly.
It is a desirable thing to force the user to enunciate clearly.
The user can easily remain focused.
It is a desirable thing to force the user to remain focused.
The user is able to be patient and is not in a hurry.
It is a desirable thing that the user is forced to be patient.
Note that the above mentioned use case of the user who wishes to memorize facts without looking or touching anything -- the target user -- complies with every single one of the above 14 heuristics.