Chapter 4. Enterprise voice application architecture

Up until now most of our voice applications have been implemented as static VoiceXML documents that are visited by the VoiceXML interpreter and rendered to a caller. The early days of the World Wide Web were similar - a web of static HTML pages that linked to one another. With the advent of CGI (Common Gateway Interface) applications, then later JSP, ASP, PHP, and a whole new breed of platforms called application servers, the Web became increasingly dynamic. Since VoiceXML, like the Web, is HTTP-based, much of this dynamic technology can be used for voice applications. This section will demonstrate dynamic page technologies with examples of voice applications using VoiceXML.

The existing technologies for generating dynamic Web content apply naturally to the generation of dynamic VoiceXML. This is fortunate, as many of the problems associated with dynamic HTTP content delivery have been solved, including:

persisting session information

The notion of sessions is built into most sophisticated dynamic Web page frameworks, including Java Servlet/JSP, ASP, PHP, etc.

separation of presentation logic from business logic

For example, the J2EE framework encourages the use of Servlets and JSPs for presentation logic and EJBs for business logic; Microsoft's .NET framework similarly uses ASP files or classes for presentation and components for business logic. In the XML world, an XML database and/or application server provide the business logic and a transformation engine using XSLT provides the presentation logic.

high availability

High-capacity application servers provide a fail-over strategy, so when one server goes down the other scan pick up the user sessions without affecting end-user's experience.

scalability

An application does not have to be rewritten in order for it to be distributed over more physical server machines.

While the challenges of delivering an enterprise VoiceXML application are similar to those of delivering an enterprise Web application, there are some aspects that are unique to voice, as opposed to HTML, including:

the problem of maintaining state is less difficult with VoiceXML than with HTML

A voice gateway maintains VoiceXML variables in the application scope for the duration of a call. Web browsers, on the other hand, have no clear way of telling when a session begins or ends.

timing is critical

Voice dialogs typically splice together pre-recorded audio and server-generated text-to-speech. Delays in delivering audio content can produce long pauses in the dialog rendering it highly confusing at best and entirely useless at worst.

dynamic grammars are often required

Since overly complex grammars can degrade the performance of the speech recognition engine, using the server-side application's state information to produce grammars on the fly can often improve recognition quality.

The remainder of this chapter will examine some of the nuts and bolts of developing dynamic voice applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.64.235