Voice-driven user interface, or VUI, is a technology trend (incorporating speech recognition, text-to-speech, and natural language processing) that is increasingly finding its way into the mainstream.  By enabling natural and effective information access and transaction anytime, anywhere, on the telephone, voice-driven user interface technologies promise to revolutionize traditional call center and helpdesk functions, increase organizational productivity, improve operating efficiency while reducing costs, and enhance customer satisfaction and employee quality of life.

A recent study conducted by the Gartner Group, involving over 300 corporate customers across four industries of airlines, railroad, car rental, insurance, uncovered a clear and present user preference for voice-driven over touch-tone or even the Internet self-service channels.  This bodes well for the future of voice-driven interfaces and prospects for its broad market adoption.


EVITA™: An Enterprise Voice Application Platform

While VUI applications are gaining acceptance, the long development process remains a bottleneck to their widespread adoption.  Currently available tools are designed primarily for professional developers and language specialists – those with experience in VoiceXML, speech recognition, dialogue design, and other platform specific languages.  EVITA – a VUI application development platform developed by WholeTree Technologies in partnership with the Institute for Infocomm Research (I2R), Singapore’s national ICT research institution – addresses many of the issues related to the development of sophisticated enterprise-grade VUI applications.  EVITA adopts a modular and extensible framework towards application building. The complexities of programming directly in VoiceXML are abstracted to a set of flexible plug-and-play modules and incorporating database objects and variable manipulations. A series of managers are made available to aid user in debugging and tuning the components of the application. By incorporating a database interface, modular design, and code abstraction, EVITA frees users from the complex details and enables them to rapidly develop dynamic and useful VUI applications.  Developing VUI applications is now a straight-forward web-based form-filling process.  Hence, EVITA lowers the barrier of entry and help promote adoption of VUI technology.  

EVITA System Architecture

System Design

The EVITA architecture deploys a set of predefined system modules which are modeled on either VoiceXML elements or function calls. These modules are presented to the user as dynamic web pages which respond to input data.  In addition, a series of application managers are available for tuning different components of the application.

These system modules are implemented in JSP and include backend JavaBeans calls to maintain data persistence, manipulate the user database, and perform data processing calculations.  These JSP modules dynamically generate a block of VoiceXML code depending on user specified parameters.  This structure emulates the object oriented programming methodology of software development and is designed for flexibility and scalability.  The user only sees a series of web-forms in the web interface, which is converted by EVITA into a series of module calls sequenced together to form the entire application.

System Managers

EVITA contains four main managers to help the user tune applications offline from the application building process:

  • The Database Manager allows the user to create, manipulate, and delete database objects which can be accessed by any of their applications.
  • The Grammar Manager allows users to save and share individuals’ and predefined grammar files, generate dynamic grammar from their databases, and guides users in creating mixed-initiative grammars.
  • The Prompt Manager allows users to upload, review, and delete audio prompts from the prompt library.
  • The Dictionary Manage” allows users to review and add to the system dictionary. The user can check their application grammars for out-of-vocabulary words, and add user-defined pronunciations for those words to the dictionary.