2.6. Mixed initiative forms

The forms that we have seen so far fall into the category of directed dialogs - the computer drills down a preset list of questions and the user just answers them one after another. At the time this is being written, almost all of the interactive voice response applications in production are computer-directed. The next generation of voice applications most likely will not be computer-directed, but rather mixed initiative.

2.6.1. Mixed initiative defined

The term mixed initiative refers to the ability for either the computer or the user to drive the conversation. For example if in the middle of a question-answer session, say the filling out of the Jimmy's Pizza questionnaire, the user could break from the predefined sequence by answering a question out of order, or by saying a keyword that would indicate which question the user would prefer to answer at any given point in the conversation.

Let's revisit the Jimmy's Pizza questionnaire form introduced in 2.1.1, “The form defined: a customer satisfaction survey form,” on page 26. This computer-directed dialog collects the three main facts that we care about, namely:

  • the quality of the food,

  • whether or not the service was courteous, and

  • the speed of service.

But what if, for some reason, the user couldn't or didn't want to answer these questions strictly as the application has them ordered? For example, what if the service was so slow at Jimmy's Pizza that the user just got up and left? The user might be inclined to try to answer the questionnaire as in Example 2-39.

Example 2-39. Why mixed initiative forms are useful
IVR     : Would you rate the quality of the food at the restaurant 
          as Excellent, Good, Fair, or Poor?

Human   : The service was so slow I never got my food!

IVR     : I'm sorry I don't understand. Would you rate the quality 
          of the food at the restaurant as Excellent, Good, Fair, 
          or Poor?

Human   : None of the above.

IVR     : I'm sorry I don't understand. Would you rate the quality 
          of the food at the restaurant as Excellent, Good, Fair, 
          or Poor?

Because of the rigid form structure, the user can't get past the first question! This frustrating scenario could have been avoided if the computer simply picked up on the word "service" or "slow" in the user's first response. These words clearly refer to Question 3: Speed of Service.

A mixed initiative version of the Jimmy Pizza customer survey would be listening for answers to multiple questions simultaneously. As a result it might proceed more like Example 2-40.

Example 2-40. Possible mixed initiative dialog for Jimmy's Pizza questionnaire
IVR     : Would you rate the quality of the food at the restaurant 
          as Excellent, Good, Fair, or Poor?

Human   : The service was so slow I never got my food!

IVR     : I assume that on a scale from zero to nine, you would rate
          the speed of service as zero. Were the employees who 
          served you courteous?

Human   : Yes.

IVR     : I'm sorry you experienced problems at Jimmy's Pizza.
          Thank you for taking this survey.

In the context of VoiceXML, a mixed initiative dialog is a form with at least one initial and at least one form-level grammar. The following sections will explain how these elements behave and how the VoiceXML author can use them to create more sophisticated voice interfaces.

2.6.2. The initial element type

If a form contains an initial, this element will be visited before all other form items. After visiting an initial, the interpreter will wait for a form-level grammar to be satisfied. Once an answer is given that satisfies the form-level grammar, the interpreter will attempt to fill any fields remaining unfilled using the standard Form Interpretation Algorithm.

The trick to making a form mixed initiative is to provide a grammar that can answer any of the questions represented by the fields of the form. Let's consider the Jimmy's Pizza conundrum that arises when the customer never gets his food. In this situation we still want to start our questionnaire with:

Would you rate the quality of the food at the restaurant as 
Excellent, Good, Fair, or Poor?

Since this is the first question of a mixed initiative form, this prompt will go in the initial element within the form in Example 2-41.

Example 2-41. An initial element
<?xml version="1.0" encoding="iso-8859-1"?>
<vxml version="1.0">
  <form id="form1">
    <initial> 
      Would you rate the quality of the food at the restaurant as
      Excellent, Good, Fair, or Poor?
    </initial>
  </form>
</vxml>

Upon entering this form, the VoiceXML interpreter will ask the caller to rate the quality of the food. Now we need to construct a form-level grammar to handle the answer to this question.

2.6.3. Form-level grammars

A form-level grammar is defined by a grammar that is an immediate child of the form. This grammar is active throughout the entire execution of the form. In a form with an initial, a form-level grammar must be present. If it is absent, the VoiceXML interpreter will simply execute the contents of initial and then wait forever, as there are no active grammars to map user utterances to actions.

The form-level grammar for the mixed initiative Jimmy's Pizza questionnaire will need to listen not only for the words "excellent," "good," "fair," and "poor," but also for any phrase that might indicate that the food didn't arrive in the first place. Responses to this effect may include:

  • "It never arrived."

  • "My food never arrived."

  • "I/We never got my/our food."

If the caller simply answers the question by saying "excellent," "good," "fair," or "poor," we would want the form-level grammar to fill in only the quality of food field and then use the Form Interpretation Algorithm to attempt to fill the other fields. If, however, one of these "my food never arrived" phrases are recognized, we would want to fill in two field values:

  • speed of service should be set to 0, to indicate the food never came; and

  • quality of food should be set to "unknown" since the customer never got to find out!

The grammar shown in Example 2-42, used as a child of the form element, defines a Nuance GSL format grammar that will do approximately what we need. GSL is a proprietary grammar specification language defined by the speech technology company Nuance.

Example 2-42. Defining an in-line grammar element
<grammar>
<![CDATA[
[
 [excellent]{<foodOK "excellent">};
 [good]{<foodOK "good">};
 [fair]{<foodOK "fair">};
 [poor]{<foodOK "poor">};
 [ (?(my food) (never arrived)) (never got [our my] food)]
           {<speedOfService "0"> <foodOK  "unknown">}
]
]]>
</grammar>

The grammar definition is enclosed within a grammar element. Since the GSL format is not XML and contains symbols that an XML parser would find offensive, we enclose the GSL grammar definition in a CDATA block. This is an XML construct for representing literal "character data." This data will be preserved byte-for-byte throughout the XML interpreting process. For more information on CDATA, see The XML Handbook.

We will defer discussion of writing grammars to 2.9, “Grammars,” on page 106. For the sake of this discussion it suffices to understand that this GSL grammar roughly translates to:

  • If the caller utters the word "excellent," fill the foodOK field with the value excellent.

  • If the caller utters the word "good", fill the foodOK field with the value good.

  • If the caller utters the word "fair", fill the foodOK field with the value fair.

  • If the caller utters the word "poor", fill the foodOK field with the value poor.

  • If the caller tells us that his food never arrived, fill the foodOK field with the value unknown and fill the speedOfService field with the value 0.

With this form-level grammar in place along with the initial block discussed in the previous section, we can expect the VoiceXML interpreter to begin this dialog by asking the caller about the quality of the food and waiting for the caller's response. If the caller answers this question directly, by saying "excellent," "good," "fair," or "poor," then the foodOK field will be filled and the Form Interpretation Algorithm will be used to fill the remaining unfilled fields. If, however, the caller matches the last grammar rule, the one that recognizes phrases meaning the food never arrived, the form-level grammar will fill both the foodOK and speedOfService fields. Then the Form Interpretation Algorithm will simply ask if the staff was courteous.

2.6.4. A mixed initiative form's fields

The remainder of the mixed initiative form is really no different than a computer-driven form; it has field elements for each field that needs to be filled in, and these fields may have filled children that are executed when the field is filled both from the form-level grammar and from the Form Interpretation Algorithm.

Example 2-43 is the mixed initiative VoiceXML form in its entirety.

Example 2-43. The entire mixed initiative questionnaire form
<vxml version="1.0">
  <form id="form1" >
    <initial name="init">Would you rate the quality of the food 
      at the restaurant as Excellent, Good, Fair, or Poor?</initial>

<grammar>
<![CDATA[
[
 [(?[the my] food was [excellent good fair poor])]{<foodOK "food">};
 [(?(my food) (never arrived)) (never got [our my] food)]
                          {<speedOfService "0"> <foodOK "unknown">};
]
]]>
</grammar>

    <field name="foodOK">
      <filled>Got it! The food was <value expr="foodOK"/>.</filled>
    </field>

    <field name="courteousService" type="boolean">
      <prompt>Were the wait staff courteous?</prompt>
      <filled>
        <prompt cond="courteousService==true">
          Got it! You experienced courteous service.
        </prompt>
        <prompt cond="courteousService==false">
          Got it! You did not experience courteous service.
        </prompt>
      </filled>
    </field>

    <field name="speedOfService" type="number">
    <prompt>How fast was your service on a scale of 0 to 9?</prompt>
      <filled>
        <prompt>Got it!</prompt>
        <if cond="speedOfService==0">
          <prompt>You never received your food.</prompt>
        <else/>
          <prompt>
            You rated the speed of service to be 
            <value expr="speedOfService"/> on a scale of 0 to 9. 
          </prompt>
        </if>
      </filled>
    </field>

    <filled mode="all">
      <if cond="((foodOK == 'poor') || (courteousService == false)
                                 || (speedOfService &lt; 5))">
        <prompt>
          I'm sorry that you experienced problems at Jimmy's Pizza.
        </prompt>
      <else/>
        <prompt>
          I'm glad that your experience at Jimmy's Pizza was, on the
          whole, satisfactory.
        </prompt>
      </if>
      <prompt>Thank you for taking this survey.</prompt>
    </filled>

  </form>
</vxml>

Comparing this with the computer-driven form in 2.1.1, “The form defined: a customer satisfaction survey form,” on page 26 we see that this part of the VoiceXML document is more or less the same. Fields that may not be filled by the form-level grammar should have a declared prompt and a grammar so the Form Interpretation Algorithm can visit these fields individually.

2.6.5. Testing the mixed initiative questionnaire

Let's consider two call scenarios. First, Example 2-44 will demonstrate how the mixed initiative form handles the exceptional case where the caller's food never arrived.

Example 2-44. A caller mixed initiative dialog using the document in Example 2-43
IVR     : Would you rate the quality of the food at
          the restaurant as Excellent, Good, Fair, or Poor?
Caller  : My food never arrived.

Both foodOK and speedOfService are filled by the form-level grammar.

IVR     : Got it! You never received your food.
          Were the wait staff courteous?
Caller  : Yes.
IVR     : Got it! You experienced courteous service.
          I'm sorry that you experienced problems at Jimmy's Pizza.
          Thank you for taking this survey.

This form will also handle the normal case where the user simply answers the initial question (Example 2-45).

Example 2-45. Another mixed initiative dialog using the document in Example 2-43
IVR     : What do you remember about your experience
          at Jimmy's Pizza?
Caller  : Excellent

In this case only foodOK is filled by the form-level grammar.

IVR     : Got it! Excellent.
          Were the wait staff courteous?
Caller  : Yes.
IVR     : Got it! 
          How fast was your service on a scale of 0 to 9?
Caller  : 8.
IVR     : Got it! You rated the service to be 8 on a scale of 0 to 9.
          I'm glad your experience at Jimmy's Pizza was, on the 
          whole, satisfactory.
          Thank you for taking this survey.

2.6.6. Handling errors

If the caller's response to the initial prompt doesn't match the form-level grammar, the initial prompt will replay. Let's say we want the dialog to fall back into a computer-driven dialog mode in this case. In other words, if the caller doesn't answer the initial prompt correctly, they will be prompted for each field individually.

In order to do this, we need to associate a catch with initial to catch either a nomatch or noinput event. The contents of this catch need to do three things:

  • play an error message to the user;

  • indicate to the interpreter that the initial no longer needs visiting; and

  • ask the interpreter to re-interpret the form taking into account initial's new state.

To indicate to the interpreter, or more specifically to the FIA, that initial no longer needs visiting, its value is set to true. This is accomplished, as shown in Example 2-46, by referencing initial's name attribute and setting its value to true.

Example 2-46. Controlling the visit logic for an initial element
<initial name="init"> 
  Would you rate the quality of the food at the restaurant as
  Excellent, Good, Fair, or Poor?

  <catch event="nomatch noinput">
    I didn't understand that answer. 
    <assign name="init" expr="true"/>
    <reprompt/>
  </catch>
</initial>

Note that the catch is not a form-level catch, but rather a child of initial. This means that this catch will be active only while listening for the caller's response to the initial prompt.

2.6.7. Conclusions

The ideal mixed initiative dialog would interact with the caller in a much more human-like manner. Imagine a dialog that could understand any reasonable answer to the question "What do you remember about your experience at Jimmy's Pizza?" We could imagine the caller responding with any number of responses, including:

  • "The food was good but the service was slow";

  • "The food was terrible and the wait staff were rude"; or simply

  • "Everything was good";

  • etc.

The difficulty in designing a voice application like this one is to write a grammar that predicts the most likely responses and then relies on the Form Interpretation Algorithm to fill in the rest. Assuming that you have tackled the task of defining your grammar, VoiceXML's Form Interpretation Algorithm can simplify the task of ensuring that all questions are answered.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.114.198