Chapter 2. The Rules of Conversation

Language is far more than a mere collection of symbols that human beings use to exchange information about themselves and the world around them. Words exist to be used to act upon the world with1 and they are always used within a context: they are used by people (or by machines), at a certain time, at a certain place, with a set of goals or intentions to create some effect or some counter action, and they are directed to a certain audience. More than that, they are spoken or written within a stream of other words, and they are delivered in a certain way.

We saw in Chapter Two the example of the meaning of, “That’s great! That’s all we need!” and how it varied depending on whether it was said in response to a statement that announced good news (the reaction thus being an expression of happiness and enthusiasm) or in response to a statement that announced not so good news (expressing deflation and dismay). In the context of active, real time conversations between humans (for instance, telephone conversations), the context is immediate and the full meaning of words and statements is being constructed by a complex set of influences. How the conversation was started, for instance, immediately creates a context and framing that can greatly influence the rest of the exchange: was it opened with a perfunctory “Hello” or was the “Hello” skipped, with the person instead opening with, “What’s going on here?” (signalling distress or anger) or “So, this is what I think…” (expressing intimacy and informality that would be marred with a perfunctory greeting). All along the conversation, the meaning of what is said will heavily depend on things like the volume used to speak a statement, its intonation and emphasis, whether it was said with hesitation or said firmly, whether it was followed by a chuckle or a yawn. In other words, the meaning of whatever is said is a creature of the precise flow between the conversational participants. Moreover, each participant will have their own meaning interpretations, and the respective interpretations may at times widely diverge and vary one from the other: I may have spoken something in jest only to have it be interpreted literally, non-ironically by my counterpart. And when I detect the misalignment, I will probably explicitly communicate to them my detection of the misalignment, as in: “Come on, you can’t be serious. You know I was joking!”

At first glance, then, the complexity of conversations may leave us with the impression that conversational interactions are highly entropic activities and much too chaotic for systematic modeling. But in fact, it turns out that conversations are highly structured interactions and observe a well-defined set of rules. Adherence to these rules is expected, while deviations from them becomes a source of meaning. For instance, just like the volume with which a statement is spoken would be a source of meaning during a verbal exchange (e.g., someone raises their voice in anger), the act of someone suddenly changing topics on us during a conversation would also be a source of meaning that we would note and try to actively interpret and understand. But more than that: deviations from the official protocol can themselves be devices that the participants can purposefully use to establish just the right protocol that best suits the conversational situation. For instance, two intimate friends conversing might skip formalities, interrupt each other, and finish each other’s sentences when they are engaged with each other; in other words, engage in behavior that would be deemed inappropriate between two people who are not intimate with each other. Or, those very two friends may revert to the formal protocol to signal that the situation is not normal: for instance, they are mad at each other or are in a social setting where such behavior is not appropriate.

Deviations from expected behavior can not only reflect relationships (e.g., friendship) and circumstances (a formal dinner party) but can themselves be transformational. A person wishing to get closer to another person may intentionally diverge from the protocol to signal their intention to become more intimate and effect that intimacy through that action, if the other person accepts the divergence. For instance, instead of opening a conversation with a formal “Hello,” the person wishing to move closer to the other person may open it with, “So I had this thought….” Should the other person accept this offer at establishing intimacy, then intimacy is established, while if the other person feels the attempt presumptuous, they could respond with, “Sorry, who is this?”

So, yes, conversations are bewildering in their complexity and human beings are impressively effective at navigating such complexity, but conversations are certainly not chaotic. In fact, that complexity is made possible precisely because a well defined, commonly adopted, protocol is available to both participants who use that protocol (as well as acts of adherence and non-adherence to it) to send signals to each other at many levels.

A structuring paradigm that we have found useful in making sense of why people behave the way they do when they engage in conversations with each other is the one proposed by the British philosopher of language, Paul Herbert Grice2. Grice stipulated that conversations are governed by a guiding principle and four maxims.3 When humans enter into a conversation with each other, Grice proposed, this guiding principle and the four maxims come into play in ways that help the participants predict behavior and extract and create meaning.

The Cooperative Principle

Let’s begin by defining what we mean here by “conversation.” For our purposes, a conversation is an exchange willingly entered into between two participants for the purpose of accomplishing a specific goal, or a set of goals. In other words, when two people enter into a conversation, they are implicitly agreeing to cooperate with each other -- hence, they are both abiding by what Grice calls The Cooperative Principle.

This may seem obvious enough, but in fact, not all verbal interactions between people are conversations in the technical sense that we are using the term “conversation” here. An interrogation, for instance, is not a conversation. If the law gives me the option, I may decline to engage altogether (in the American context, I may “plead The Fifth”). Or, if I do decide to engage, I may, under the advice of my counsel, engage minimally: I may be instructed not to lie, for instance, but I am not instructed to go out of my way to keep the other side from reaching the wrong conclusion, if the wrong conclusion works in my favor. I could stick to answering the letter of the question, for instance, or I may decide not to use specific terms and I will make the other side work hard to ask me to clarify myself when I am being cagey.

In contrast, during a conversation, we are expected to cooperate with each other. I will tell you the truth and only the truth; I will give you as much information as would reasonably be expected of me; I will stay on topic, I will speak clearly, and throughout the conversation, my goal, as is yours, is to be helpful and to advance our exchange forward.

How is this useful in the context of an interaction between a human and voicebot?

At a basic level, understanding that a user enters a conversation with a voicebot with the intent to cooperate should prime the product manager and the designer to work hard to ensure that conversations with voicebots have a very well defined purpose and goal. If one engages a voicebot to cooperate, then they are coming to the voicebot with a problem or a goal that they want to accomplish and they want the voicebot to help them solve that problem or accomplish that goal. In other words: a successful conversation is one where the participants have managed to cooperate with each other to accomplish a goal. If no goal was accomplished, because that goal was not well defined, then the conversation was as much of a waste of time as if there was a set of goals, but the voicebot was just designed well enough to engage the human competently and effectively.

Secondly, by understanding this basic fact -- that the Cooperative Principle is guiding conversations -- a designer should spend the bulk of their energies designing for users who enter their conversation with voicebots with the intention to cooperate. Those are the users that the designer should care about the most. Interactions where the user is not cooperating -- for instance, they are asking for things that are clearly out of scope -- should be identified and dealt with quickly (for instance, connect such uncooperative users to a human being, or provide them with information on how to reach a human being or obtain additional information, and then try to quickly wrap up the conversation).

The Maxim of Quality

When I engage you in a conversation, I expect you to: (a) speak truthfully – i.e., not lie, (b) make assertions only to the extent that you believe that they are true, and therefore to tell me when you think that you are not sure about something, and (c) avoid saying things that would obviously lead me to draw the wrong conclusions.

How is this useful in the context of voicebots?

At the most basic level, at the very least, the voicebot should speak truthfully. Again, this may sound like a truism, but it is not, given the many deployed voicebots that make promises that they don’t keep. For instance, if the user asks to speak to a human agent and the voicebot signals to the user that they are about to connect them with one, the voicebot should do exactly that instead of, say, transferring them to another voicebot. An example of this that unfortunately happens only too often in IVR systems is when an IVR of a department within a large organization (say, a bank) determines that the caller can be better served by another department, and so the IVR transfers the caller to another IVR when the caller had asked for a human.

The Maxim of Quantity

Here my expectation is that the voicebot will give me as much information as I need, but not more than what I need.

Instead of:

  • Human: Is Gary Danko open today?

  • Voicebot: Yes.

Use:

  • Human: Is Gary Danko open today?

  • Voicebot: Yes. Gary Danko is open today until 9:00 pm.

By the same token, I expect the voicebot not to give me information that I don’t need for the purposes of the conversation at hand.

Instead of:

  • Human: Is Gary Danko open today?

  • Voicebot: Yes. Gary Danko is open today until 9:00 pm. Tomorrow, Sunday, they are open from 12 noon to 7 PM and on weekdays, from 12 noon until 8:00 pm.

Use:

  • Human: Is Gary Danko open today?

  • Voicebot: Yes. Gary Danko is open today until 9:00 pm.

The Maxim of Relevance

In addition to expecting the voicebot to give me truthful information and just enough information (not significantly more or less than what we both know is needed) to keep the conversation moving along, I also expect the voicebot to talk about things that are relevant: that is, things that make sense in the sequence of our exchange.

Instead of:

  • Human: Is Gary Danko open today?

  • Voicebot: Yes. Gary Danko is open today until 9:00 pm. By the way, Amazon Photo storage is now available for 25% discount to all Amazon Prime members.

Use:

  • Human: Is Gary Danko open today?

  • Voicebot: Yes. Gary Danko is open today until 9:00 pm.

The Maxim of Manner

Here, the concern is with the language used to communicate the content: the voicebot may be truthful, may be saying as much as it needs to and no more, and may be right on topic, but if it uses terms that I may not be familiar with, then the voicebot would be violating the maxim of manner.

Instead of:

  • Voicebot: Which of the two bills are you inquiring about: S. 1511 or H.R. 2265?

Use:

  • Voicebot: Which of the two bills are you inquiring about: The “Protecting America’s First Responders Act” or The “Financial Exploitation Prevention Act”?

I would also expect the voicebot to be specific to the extent that it can be specific.

Instead of:

  • Human: Are there any Thai restaurants nearby?

  • Voicebot: Yes. There are several Thai restaurants nearby.

Use:

  • Human: Are there any Thai restaurants nearby?

  • Voicebot: I found a few nearby restaurants. There’s New Star Asian Bistro, 1.2 miles away on Old Dominion Drive, Chiang Mai Thai Cookhouse, 1.6 miles away on Elm Street, and Esaan Northeastern Thai Cuisine, 2 miles away on Old Chain Bridge Road.

One last note.

As we mentioned, core to our philosophy of design is the proposition that it is a mistake to emulate too closely human-to-human conversations when designing for human-to-voicebot conversations. For instance, human beings may very much mind being interrupted and it would be rude to interrupt a human being and unilaterally take the conversational turn away from them. But with a voicebot, this is not the case: if the voicebot is not giving you what you want, you, the human, should interrupt it and set it back on track (or maybe ask it to stop talking). A designer who would design with the assumption that the user should and will observe the rule of negotiating turn ownership with the voice assistant is not only wasting time designing a sophisticated assistant, but is probably designing for a frustrating experience: will the assistant object and insist on retaining the turn? Will the user be asked to refrain from interrupting next time they do interrupt? Obviously, not.

So, yes, Grice’s model is useful and the paradigm he offers helps us devise sound design strategies that will deliver a highly usable voice first interface: for instance: staying on topic, signalling shifts in topic, not speaking too long, not giving unnecessary information -- all of these are key ingredients for delivering great voice first experiences.

But at the same time, to be more useful than harmful, Grice’s paradigm should be adopted only if one takes it with a proviso in mind -- what we are calling, the-human-voicebot asymmetry proviso:

There will be many instances where it does not make sense for the human to observe human-to-human rules of conversation. The asymmetry proviso consists in the following statement: even if the human is not expected to observe the human-to-human rules of conversation, the voicebot should be. For instance, the human should be able to interrupt the voicebot whenever they want to interrupt it, but the voicebot should not interrupt the human while the human is speaking. The human should be able to quit the conversation whenever they want to and in any way they want to, but the voicebot should not.

1 Searle, John, Speech Acts: An Essay in the Philosophy of Language, Cambridge University Press, 1969

2 Grice, Paul, “Logic and Conversation,” in The Logic of Grammar, D. Davidson and G. Harman (eds), Encino, CA: Dickenson, 64-75, 1975.

3 Harris, Randy Allen, Voice Interaction Design, Morgan Kaufmann Publishers, 2005, pp. 75-126.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset