2.3. Prompt and audio prompts

A prompt element controls the output of pre-recorded audio and synthesized speech. It allows for both forms of audio output to be combined or, in some cases, for using the synthesized speech as an alternative when pre-recorded audio is not available.

This section will cover the different ways to output audio and will also highlight some of the differences between the VoiceXML 2.0 Draft and the VoiceXML 1.0 Specification.

2.3.1. The prompt element type

Though prompt is used to control audio and speech output, it is not always necessary to explicitly use the <prompt> and </prompt> tags. The prompt tags are needed only when:

  • speech markup elements are used,

  • audio and synthesized speech are interleaved, or

  • any attributes of the prompt element are needed.

The following two examples are valid prompts that do not require prompt tags:

  • <audio src="greeting.wav"/>

  • Hello, welcome to our new voice menu.

If a prompt is repeated, perhaps because the user failed to respond properly to a question, a different version of the prompt can be used. This is referred to as tapered prompts.

Using prompt's count attribute indicates when the prompt will become active. The last prompt will be repeated until there is another prompt count that gets activated or until the user responds with a valid answer. In Example 2-19, the prompt, "What is your preferred credit card?" will be spoken twice, then the second prompt, "Say either Visa, Master Card, American Express, or other," will be repeated until the user says a valid response.

Example 2-19. Tapered prompts
<field name="getCreditCardType">
  <grammar src="validCreditCards.grxml"/>
  <prompt count="1"> 
    What is your preferred credit card? 
  </prompt>
  <prompt count="3"> 
    Say either Visa, Master Card, American Express, or other.
  </prompt>
</field>

A dialog scenario for Example 2-19 might go as shown in Example 2-20. Note that since there is no <catch event="nomatch"> </catch> or <nomatch></nomatch>, the platform typically provides a default nomatch handler that plays a message like "I don't understand."

Example 2-20. A dialog scenario demonstrating tapered prompts
IVR     : What is your preferred credit card?
Human   : Discover.
IVR     : I don't understand. What is your preferred credit card?
Human   : Discover.
IVR     : I don't understand. Say either Visa, Master Card, American 
          Express, or other.

Another attribute of prompt is cond, which will cause the prompt to play only if the condition is true. In Example 2-21, we will only play the prompt for a novice user. Notice that we also use the attribute bargein. Setting it to false prevents the caller from interrupting the playback of this prompt. This could be used to force a novice user to listen to the entire message before attempting to answer. Disabling barge-in is also useful for preventing background noise in the caller's environment from interrupting playback.

Example 2-21. A prompt that prevents barge-in
<prompt cond="noviceMode=='true'" bargein="false">
  We accept the following credit cards: Visa, Master Card, or 
  American Express. You should say the name of the credit card 
  that you will be using for this transaction.
</prompt>

2.3.2. The audio element type

An audio element instructs the VoiceXML interpreter to play an audio response, either pre-recorded or synthesized speech, to the user. An audio element can contain plain text, speech markup, or digitized audio. The VoiceXML Specification recommends minimum support for 8 bit 8 kHz RAW PCM and WAV recorded files.

Example 2-22 shows all three forms of speech output.

Example 2-22. A prompt combining plain text, speech markup, and digitized audio
<prompt>
  Here is a text message.
  <audio src="goodbye.wav"> 
    <emphasis>Goodbye</emphasis> 
    thank you for your patronage. 
  </audio>
</prompt>

Recall from 2.3.1, “The prompt element type,” on page 47, that prompt elements can contain any form of speech data. "Here is a text message" will be rendered to speech using text-to-speech conversion. In the next line, an audio element opens with the attribute src="goodbye.wav". If the file goodbye.wav exists then it will be rendered by an interpreter-supported audio player. If the goodbye.wav file does not exist then the speech markup "<emphasis>Goodbye</emphasis> thank you for your patronage" will be rendered using text-to-speech. Thus, the text inserted within an audio element provides an alternate audio if the file does not exist.

Some care should be taken when incorporating audio into VoiceXML applications. Some audio file formats support streaming audio while others do not. Streamable formats are preferred as they allow the file to be played while it is still being transferred. Non-streamable file formats require the complete file to be transferred before the file can begin playing.

Simply referencing an audio file of a streamable format in an audio element doesn't guarantee that it will in fact be streamed. How it is handled is interpreter dependent. Some interpreters provide attribute extensions supporting streamable formats that can enable playback while simultaneously downloading a file.

Another attribute of the audio element type, fetchhint, is used to specify when a digitized audio resource is downloaded from the server. The values supported are safe and prefetch. The value safe will ensure that a new copy of the file is loaded whenever the audio element is encountered. The value prefetch will download the audio file at the time when the VoiceXML document is downloaded.

2.3.3. The say-as element type

As mentioned in the previous section, VoiceXML allows for speech markup tags, such as emphasis, to be inserted around text. There is another speech markup element type, say-as, that specifies how speech data is to be rendered. It supports the following built-in data types:

  • acronym,

  • address,

  • currency,

  • date,

  • duration,

  • measure,

  • name,

  • net,

  • number,

  • telephone,

  • time.

Example 2-23 demonstrates the use of say-as with the built-in currency data type.

Example 2-23. A say-as element used to control how speech is rendered
<prompt>
  Your purchase total of <say-as class="currency"> $200.58 </say-as>
  has been processed.
</prompt>

The built-in data type currency will ensure that the total is spoken as "two hundred dollars and fifty-eight cents." Without a say-as element, the rendering of this example would be determined by the text-to-speech synthesizer. For instance, without any additional support from the text-to-speech converter the total might be spoken as "dollar two hundred point fifty-eight."

In VoiceXML 1.0 there was a sayas element. This got replaced with say-as in VoiceXML 2.0. sayas supported the following more limited built-in data types.

  • currency,

  • date,

  • digits,

  • literal,

  • number,

  • phone.

For further details on say-as see 3.48, “say-as,” on page 272.

2.3.4. Speech markup element types

VoiceXML 2.0 has deprecated the VoiceXML 1.0 speech markup element types emp, div, pros, and sayas. These element types were based on the Java Speech Markup Language, JSML. They are replaced with similar element types, along with additional speech markup element types that are now based on the W3C Speech Synthesis Markup Language. The new speech markup element types are part of the draft DTD, but defer to the W3C Speech Synthesis Markup Language as the definitive specification.

The element types in VoiceXML 2.0 that replace the deprecated speech markup element types in VoiceXML 1.0 are these:

  • emphasis replaces emp,

  • prosody replaces pros,

  • sentence and paragraph replace div,

  • say-as replaces sayas.

Additional new speech markup element types are phoneme, voice, and mark. The break element type has seen little or no change between the two versions. For details on all these element types, see Chapter 3, “VoiceXML language reference,” on page 136.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.24.90