1.3. Road map to the book and other resources

While this book is organized into a logical progression from basic concepts to more difficult ones, we understand that many readers will require information in a different order they may need for their particular task. Therefore we provide this chapter to guide readers quickly to the information they seek.

1.3.1. How to use this book

This book is primarily intended for the developers who want to write voice applications using VoiceXML. We have attempted to provide detailed language information and a broad collection of code examples. Of particular interest to the programmer will be Chapter 2, “VoiceXML essentials,” Chapter 3 “VoiceXML language reference,” and Chapter 4, “Enterprise voice application architecture.”

This book is also intended for higher-level managers and decision makers who need to understand the risks and challenges associated with developing and deploying voice applications, as well as the role VoiceXML plays in the voice application space. Of particular interest to the manager would be this chapter and Chapter 5, “Voice services,” and to a lesser degree Chapter 4, “Enterprise voice application architecture.”

The following gives a brief summary of the chapters of this book.

Chapter 1, “VoiceXML and voice services”, introduces the terminology and basic concepts. It also gets you started with setting up a VoiceXML development environment.

Chapter 2, “VoiceXML essentials,” on page 24 discusses the important constructs of VoiceXML. It will cover all of the essential skills required to build dialogs in VoiceXML, including:

  • how to collect user input,

  • how to generate responses,

  • how to write grammars,

  • how to control dialog flow.

Chapter 3, “VoiceXML language reference,” on page 136 provides an element-by-element reference to the entire VoiceXML language including GRXML and SSML. This reference chapter is based on the VoiceXML 2.0 Specification but also contains legacy information pertaining to the VoiceXML 1.0 Specification.

Chapter 4, “Enterprise voice application architecture,” on page 308 explores the integration of voice and data systems using VoiceXML with comprehensive examples. These examples are all complete and intended to provide some insight into the design process, architecture, and implementation of enterprise voice applications.

Chapter 5, “Voice services,” on page 362 takes a close look at building voice services, as well as the components and protocols for deploying both local and enterprise systems. It revisits voice application design from a human-factors perspective, and discusses the trade-offs of application development versus outsourcing. Finally it concludes with a look at the voice application eco-system, including other specifications and future directions for the VoiceXML and voice application fields.

1.3.2. Terminology

This section provides a quick primer on the jargon and acronyms that pervade the voice application industry.

Telephony services terms

Dual-Tone Multi-Frequency (DTMF)

Refers to the touch tones (0-9, *, and #) on a standard telephone.

Public Switched Telephone Network (PSTN)

This typically means “the telephone company.”

Interactive Voice Response (IVR)

This term can be used as an adjective, as in “my IVR application,” or as a noun referring to the actual IVR hardware, as in “the call is passed to the IVR.” A VoiceXML interpreter can be thought of as an IVR.

Communication protocols

Voice Over Internet Protocol (VoIP)

(Pronounced “Voice over IP.”) This is a technology for sending voice data, such as phone calls, over an IP network, such as the Internet.

H.323

A broad class of specifications for packet-based communications protocols. H.323 includes a specification for VoIP as well as SIP used to carry both voice and video-conferencing data.

Session Initiation Protocol (SIP)

This is a protocol for setting up calls. It is part of the H.323 specification and is starting to be implemented in bleeding edge telephony products and infrastructure.

Media Gateway Control Protocol (MGCP)

Provides a standard for converting analog audio data into IP-based packet data.

Call Processing Language (CPL)

An XML-based protocol to describe and control Internet telephony services. It implements a subset of SIP.

Web protocols

Internet Protocol (IP)

A protocol used by computer applications to intercommunicate over a network.

Transmission Control Protocol (TCP)

This protocol is responsible for verifying the correct delivery of data to its destination. TCP adds support to detect errors or lost data and to trigger retransmission until data is correctly and completely received at the destination.

HyperText Transport Protocol (HTTP)

This is a protocol that sits atop TCP/IP (the combined TCP and IP protocols). Originally designed for browser-to-Web-servercommunication, it is based on a request-response paradigm and is used by a VoiceXML interpreter to communicate with a document server.

Common Gateway Interface (CGI)

This is a protocol for integrating dynamic Web page services with an HTTP Web server.

Java Server Pages (JSP)

A language for embedding server-side Java code into pages served by a JSP-enabled HTTP Web server.

Active Server Pages (ASP)

Microsoft's answer to JSP.

Hypertext Preprocessor (PHP)

PHP is a widely-used general-purpose scripting language that is especially suited for Web development and can be embedded into HTML.

XML-related terms

Document Type Definition (DTD)

XML markup declarations that define the structure and other properties of a class of XML documents.

Extensible Stylesheet Language Transformations (XSLT)

An XML document that specifies how to transform another XML document into yet a third.

Namespace

A way to specify the scope of names in an XML document.

Voice services

Automatic Speech Recognition (ASR)

A system that listens to an audio stream containing human speech and produces a symbol representation of that speech. This can be implemented in either hardware, software, or some combination.

Text-To-Speech (TTS)

A system that takes a symbolic representation of speech (i.e. text) and renders it as audio.

Speaker Verification / Authentication

A technology that discerns how different people say the same words. This “voice printing” can be used to ensure a caller is who he says he is.

Form Interpretation Algorithm (FIA)

The FIA is an integral part of VoiceXML. It is the logic that drives the interaction between a user and a VoiceXML form (or menu). It controls how variables are initialized, when to enter and leave a form, which items to visit in a form, and other form related logic. For a complete description of the language see Appendix C, “Form Interpretation Algorithm,” on page 430.

1.3.3. More resources

There is a growing community of VoiceXML sites on the Web. The best jump-off point is http://www.voicexml.org. This is where you can find the most recent Specification, as well as numerous links to other companies and websites.

While we have made an effort to keep this book as vender-neutral as possible, if you want to start working with VoiceXML, you'll need to use one of the freely available products. Here are a few pointers.

http://developer.voicegenie.com

VoiceGenie makes a VoiceXML platform. They have two developer boxes that you can call into to test your applications. This service is free, but the call is a toll-call to Toronto. You will need to create a login.

http://community.voxeo.com

Voxeo is a voice-hosting service. They provide a toll-free developer system that you can call into to test your applications. This service is free. You will need to create a login.

http://www-3.ibm.com/pvc/products/voice/voice_technologies.shtml

IBM's WebSphere product is VoiceXML enabled. They provide a free download of their WebSphere Voice Server SDK2.0 and the WebSphere Voice Toolkit 2.0. These two products comprise a speech-server and a VoiceXML development environment (a full-fledged VoiceXML IDE) that you can run entirely on your desktop machine (CPU permitting).

http://extranet.nuance.com

This is Nuance's developer site. Here you can download developer licensed versions of Nuance 8, their main ASR product, Vocalizer, their TTS product, andV-Builder, their VoiceXML IDE. Once installed, these components allow you to write and test VoiceXML on your desktop PC. These are free downloads. You may need to create a login. You may need to download the NT patch from this Nuance website for Vocalizer.

http://www.speech.cs.cmu.edu/openvxi/OpenVXI_2.0.1/Readme.html

This is the main page for the Open VXIproject, originally started in the Carnegie Mellon University speech research group, then taken over by Speech Works, Inc. (http://www.speechworks.com). Open VXI is an open-source VoiceXML interpreter.

http://www.heyanita.com

HeyAnita is a voice-hosting service. They provide a toll-free developer system that you can call into to test your applications. This service is free. You will need to create a login.

http://www.telera.com

Telera is a VoiceXML platform company. They provide atoll-free developer system that you can call into to test your applications. This service is free. You will need to create a login.

1.4. Getting Started

It's “Hello World” time! There are two routes you can take to get up and running with VoiceXML. The first is to set up an account with one of the online VoiceXML development services and then set up a directory that can be accessed from the Internet where you will keep your VoiceXML files. The second is to download a VoiceXML interpreter, an ASR engine, and a TTS engine, and install these on your desktop PC.

Depending on which route you take you should look at either 1.4.1, “Setting up a remote hosted environment,” or 1.4.2, “Setting up an IDE environment,” to get your environment up and running.

1.4.1. Setting up a remote hosted environment

Using a remote hosted environment is probably the easiest way to get up and running quickly. It does, however, require both Internet and telephone access during the testing process.

Setting up this environment consists of two steps. First you need to find server space where you can put VoiceXML documents on the Internet. The best solution is to have access to a Web server that can be accessed by domain name or IP address from the Internet. The next best solution is to use a free Web-hosting service. It is important to find one that doesn't litter your pages with advertisements, as their HTML code will cause the VoiceXML interpreter to freak out.

Let's go ahead and assume you don't have a Web server of your own and would like to develop on one of the free hosting services and VoiceXML development platforms. The following example will useGeoCities as our free Web server and VoiceGenie as our VoiceXML development platform.

We'll start by setting up an account on GeoCities. To do this you'll need to follow the “Sign up for a free website” link on their hope page at http://www.geocities.com. This will take you through the registration process. Once you have completed this and are fully logged in, you'll want to go to the File Manager application. You can do this by going to http://geocities.yahoo.com/filemanager. Here you will see a Web-based file manager. You'll want to click on “New (Create a new HTML file)”. This will open up a Web-based text editor with some skeleton HTML markup. Delete all of this text and enter instead the contents of Example 1-1.

Example 1-1. hello.xml
<vxml version="2.0">
  <form>
    <field name="hello" type="boolean">
      <prompt>Isn't this exciting?</prompt>
      <filled>
        <prompt>
          You said <value expr="hello"/>
        </prompt>
      </filled>
    </field>
  </form>
</vxml>

There should be a text field labeled “Filename:”. Enter hello.xml. Now press the button labeled “Save”. Note that we used the .xml extension instead of the more typical .vxml extension. This is because many of the free sites insist that you use a common Web extension and don't recognize .vxml. VoiceXML interpreters rarely care what the file is called and .xml certainly is as accurate as .vxml, if not as specific.

You have now published your first VoiceXML document to the Web. To verify this visit your new VoiceXML website by typing http://www.geocities.com/yourname/hello.xml into the address field of your browser, replacing yourname with whatever account name you gave yourself when creating your GeoCities account. You should see more or less what you typed in. Depending on your browser settings you may see only:

Isn't this exciting? You said

For example, in Internet Explorer, you can see the entire source by selecting the menu View and then the menu item Source

The next step is testing your application using one of the free VoiceXML development platforms. Let's use the VoiceGenie platform as it is relatively easy to use. This is a VoiceXML interpreter running on a computer with telephony hardware. You can call in to this machine over the telephone and interact with your VoiceXML application.

In order to test your application you will need to create an account on VoiceGenie's development server. You can do this by visiting http://developer.voicegenie.com and clicking on the “Register” link. This will guide you through the account creation process.

Next, you will need to assign an “extension” to your application. An extension is just a five digit number that you need to dial after dialing in to the VoiceGenie server. To assign a new extension click on the tab labeled “Tools”. You will then see a link labeled “Extension Manager”. Click on this link.

The table showing all of your extensions will be empty. At the bottom of this list will be a text field labeled “Add:”; type into this text field the same URL you used in your browser to look at hello.xml, namely http://www.geocities.com/yourname/hello.xml. Click the Add button. You should now see a five digit number followed by the aforementioned URL.

Now you will need to pick up the phone and dial the telephone number for one of their development boxes. They have two boxes configured using different TTS and ASR technologies. For this application, either one should work. When you are connected you will hear a welcome message and then you will be asked for your extension. When you say the five digit extension shown in the table you will be transferred to a VoiceXML interpreter running your application. Your dialog with the interpreter might be similar to the one in Example 1-2.

Example 1-2. Interaction with hello.xml
Interpreter : Isn't this exciting?
You         : Yes.
Interpreter : You said yes.

You now have a hosted VoiceXML environment suitable for developing static VoiceXML applications. As we go on to the examples that require dynamic document generation technologies like ASP, JSP, etc. you will need to find a more sophisticated server environment.

In addition, you can try different VoiceXML hosts for your hello.xml file now that it is on the Web. The process for creating developer accounts and assigning extensions is pretty much the same for all of voice-hosting service providers.

1.4.2. Setting up an IDE environment

If you want to test right on your desktop PC, you'll need to download a VoiceXML IDE (Interactive Development Environment) system. This will need to include ASR, TTS, a VoiceXML interpreter, and optionally some sort of development tools. The two most mature candidates in this arena are IBM's WebSphere Voice Toolkit and Nuance's V-Builder.

There are a few caveats with this approach. First, the downloads are enormous! (On the order of hundreds of megabytes for all of the ASR and TTS data.) Second, they require considerable CPU power and RAM to run properly. A third issue is the tedium of installation. None of the packages has a “one-button” installer, but instead they require you to find the right packages and install them in the proper order. This can be time consuming and frustrating.

The advantage to this approach is the fact that your development environment is completely self-contained. You don't need Internet connectivity, nor do you need to continuously call into a VoiceXML interpreter to test your application - which, over the long haul, might prove to be more frustrating. These IDE products provide a telephone simulation-mode, where you do not have to use a telephone, though you will need a headset and microphone connected to your PC.

To install IBM's IDE, Voice Toolkit, you will also need to download their Voice Server SDK.

You can start by going to http://www-3.ibm.com/pvc/products/voice/voice_technologies.shtml and scrolling down the page for information about both products. You must also download at least one language package along with the main installation package. After you have downloaded the Voice Server SDK product, you can remove the download package(s) and the extracted installation program files.

To launch the IBM WebSphere Voice Server SDK 2.0 Installation Wizard, run the setup.exe file, which is located in the directory where you unpacked the installation package. Follow the instructions in the Installation Wizard to install the SDK.

Repeat the procedure for downloading Voice Toolkit. Run the setup.exe file to begin the Installation Wizard and follow the instructions.

You are now ready to develop VoiceXML applications on your PC without requiring telephone connectivity using telephone simulation. The Voice Toolkit IDE provides the following:

  • VoiceXML editor,

  • VoiceXML debugger,

  • grammar editor,

  • grammar test tool,

  • pronunciation builder,

  • built-in audio recorder,

  • VoiceXML reusable dialog components,

  • speech recognition engine,

  • Text-To-Speech engine.

1.4.3. Conclusions

We've now gotten our first VoiceXML application to work. In a mere nine lines of code we've demonstrated TTS, ASR, and a trivial call flow. The next chapter will pick up where this one left off, starting with simple forms and going on to cover all of the major language features of VoiceXML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.150.123