site search   

THE DATA STREAM FOR VISIONARIES OF THE CONVERGENCE ERA      
Feature  June 2001

Asking for it
How voice-portal technology is advancing the evolution in communications
Bob Schechter, NMS Communications

Voice-driven applications have advanced to the point where they are beginning to provide access to Internet-based information through ordinary telephones. Voice portals, in particular, are generating keen interest. With core technologies now available, and with the difficulties of Web access on mobile devices becoming clear, voice-enabled applications hold great promise.

Yet it remains to be seen precisely how and when voice-driven services will enter the mainstream of daily commerce and everyday life. One thing is clear: Effectively extending a voice-enabled Web of possibilities to the world’s more than 500 million mobile users is a critical next step in the communications evolution.

Imagine, for example, a virtual personal assistant at the ready through your cell phone 24x7x365. This tireless assistant keeps tabs on your schedule, location, contacts, and preferences; immediately connects you to anyone reachable by telephone, email, or fax; retrieves information about any event, topic, person, or place; and enables instant access to life-saving personal health information. With voice-portal and wireless technology, all this and more can be available through the telephone. What’s more, speech-recognition technology could also act as the interface between you and highly specialized intelligent devices located in your home, your car, or in your pocket—everywhere you go.

Early portals

Voice portals provide access to many kinds of information by voice—using a voice browser—over the telephone. In the same way Internet portals such as Lycos and Yahoo! let you access information by typing on the PC, voice portals provide access to a variety of information and Web-based content from any telephone—including wireless telephones.

Extending a voice-enabled Web of possibilities to the ubiquitous telephone is a critical next step in the communications evolution.
Voice-interface terminology, as in all fast-changing technology areas, is evolving, so some definitions are useful. Stand-alone, voice-enabled Web sites let you interact with narrow Web content by speaking commands into the telephone. Such voice-enabled sites, like those offered by American Airlines, Charles Schwab, and UPS, let you ask for provider-specific information such as flight arrival and departure information, real-time stock quotes, or package delivery status. They also let you perform actions such as placing trade orders or scheduling package pick-ups. These services often confirm your selections with a spoken summary of your request. They mimic the most natural, and therefore convenient, way humans interact—by speaking and listening—using the ubiquitous telephone.

Voice portals such as BeVocal and Tellme provide audio content on demand via a voice browser, also using toll-free numbers and speech recognition. Currently, about a dozen voice portals offer similar capabilities. To use BeVocal, for example, you call a toll-free number and, using simple voice commands, select from a range of information services: driving directions, weather, stock quotes, flight information, news, sports, horoscopes, lottery, TV dramas, or business information. Typically, these portals can personalize services according to your areas of interest. Other types of customization are available, too, such as setting up home and work addresses to save you time when requesting driving directions.

Another group of voice portals combines Internet information access with integrated communications and personal information management (PIM) functions. Voice portals like HeyAnita and AOL by Phone provide unified inbox capabilities, allowing users to hear email, voice messages, and faxes, as well as information such as stock quotes. Still others, like Webley and Conita, combine personal-assistant features that provide a single point of contact and “intelligence” in the form of personal preferences.

Voice portals employ the latest speech-recognition improvements. In the past, consumers groaned when a computer answered the telephone rather than a human. They found the robotlike speech of automated systems, as well as the inflexible interfaces, annoying. For example, speech-recognition systems couldn’t handle caller interruptions. If you were “rude” and started to speak before hearing all the options, the robot voice on the other end would just keep talking. New features like “barge in,” which enables interruptions, give automated systems a more natural feel and flexibility. You can simply ask for what you want, with a specific command like “Restaurants,” and move on.

Speaking naturally

Consumers are showing their approval for these new developments. A recent study conducted by Evans Research shows increasing consumer acceptance of the use of speech recognition across a broad range of applications and services. In the study, more than 80 percent of users prefer speech systems over touch-tone systems, the Web, or speaking with live agents and operators. Satisfaction among wireless telephone users is even higher at 96 percent.

With the proliferation of wireless devices and device convergence, voice interfaces make even more sense. Tiny displays and keypads make wireless devices unsuitable to much typing. In fact, the need to type or use a stylus runs counter to the main benefit of wireless devices: mobility. As wireless devices increase in computing power, voice interfaces are not just a desirable feature, but will soon become a basic requirement.

Key players

Many entities—telephone companies, Internet portals, and others—are making their moves toward voice portals. Orange Networks, for example, one of the UK’s fastest-growing mobile operators, last year put voice-activated assistant technology into its network with Wildfire. Wildfire is a personal assistant that accepts voice commands to manage contacts, dial outgoing calls, handle faxes, and screen calls.

Also notable was AT&T’s $60 million investment in Tellme Networks. AT&T’s minority stake has given Tellme access to AT&T’s network and its 60 million consumers. Qwest Wireless, another recent entrant, now offers voice browsing for about $5 a month using BeVocal’s applications and hosting technology. Users reach the Qwest automated assistant by dialing *www (*999).

AOL, as another example, last year acquired the Quack.com voice portal in a shortcut move to voice access. By October, AOL had launched AOL by Phone, providing 25 million users with access to email, news, weather, and stock quotes via voice commands. Undoubtedly, AOL/Time Warner’s reach and content will expose many new users to the convenience of voice browsing.

Internet portals are following suit. Yahoo! now provides users with voice access, as does Lycos. PDA manufacturers are making their moves, too. Accessories like the VisorPhone, which turns a Visor PDA into a cell phone, are clearing the way for voice applications.

The early moves of these key players point squarely in the voice-access direction. They are responding to four major trends: the growth of the Internet, the increase in affordable and available mobile telephone service, the demand for content anywhere, anytime, and advances in the enabling core technologies.

Add to these forces the mass appeal of the convenience, productivity, and mobility voice interfaces provide, and you can see why the number of voice-portal users is expected to grow to 45 million by 2005. The Kelsey Group projects that worldwide spending and revenue from the voice Web and voice applications will reach $41 billion by 2005.

Vendors of text-to-speech and voice-recognition technology will obviously benefit from this demand, as will voice-portal application providers and providers of WebTalk services (which provide combined Web surfing and chat with one or more people). However, these trends are opening new market opportunities for others, too. For infrastructure providers, the opportunities lie in network platforms, servers, and gateways. And carriers and service providers will be able to expand and retain their customer bases and generate revenue through enhanced services.

Open portals

Voice portals can be expected to evolve from a narrow, proprietary model with inherently limited value to the consumer to a more open, standards-based, value-differentiated model.

The need to type or use a stylus runs counter to the main benefit of wireless devices: mobility.
Today’s voice portals share many common traits besides offering much the same types of information access. First, they work much like traditional IVR (integrated voice response) systems in that they use a specific set of spoken commands, rather than a robust vocabulary. The services are typically free or offered at a premium. Moreover, today’s service providers build and own the entire voice portal and update the database content. The content is customized wholly for voice access and is often kept in a proprietary format.

In contrast, we can expect tomorrow’s voice portals to access information from many content providers. To do so, they will need to be able to access the different types of media on the Web to meet our expectations of true voice browsing. With the high level of personalization and value that voice portals make possible, customers will tend to stay with a service provider. Consequently, service providers will own detailed customer profiles, creating new commerce opportunities.

How far can we go with voice-driven applications? One vision of the future is the human-centric computing environment called Oxygen, under development at the Laboratory for Computer Science at MIT. Michael Dertouzas, the director, describes the environment in The Unfinished Revolution: “Instead of you going to the machine...the system is all around your human world, ready to handle your needs. Interactions between you and the system become natural through speech and vision...You no longer have to plan what you’ll type, you just react to what you see and hear by holding a dialogue with the machine. Devices, especially for mobile use, become anonymous and acquire the ‘info personality’ of whoever is using them at the moment. Security, too, becomes person-centered rather than device-centered.”

Accordingly, evolving applications based on wireless technology, such as Orange’s home of the future, seem well within grasp. “Orange at Home” is a detached house that integrates a range of advanced services using wireless technology. Such a house would offer work-at-home office communication, entertainment, home systems, information access, and more with the telephone serving as the remote control for all of it. Numerous technologies are making this viable, such as broadband access to the home through satellite, cable, and DSL, together with new wireless technologies such as Bluetooth, IEEE 802.11 wireless LANs, and HomeRF.

According to a recent PELORUS Group report on wireless speech recognition, more than 40 percent of US households have Internet access and more than 108 million have mobile-telephone accounts. Projections show that half of the US population will have mobile subscriptions in the next three years, with speech-enabled wireless subscriptions increasing from 5 percent of users today to nearly 50 percent—a market size of $3 billion—by 2005.

Further implications

With the ability to use awareness (showing whether a user is present online), location-specific information, and a choice of interfaces that suits the task (speech, touchscreen, or conventional point-and-click), our expectations of technology will continue to rise. Users will demand the right information, at the right time, in the right form, with the right disposition options for that information. Businesses will have the greatest need for this four-way fit.

Business applications are ripe for voice access. An enterprise voice portal could provide employees with call-in access to content on enterprise Web pages or customer-contact database information. Business voice portals would provide an expanded communication portal from which to retrieve enterprise email, documents, faxes, and more.

GM’s OnStar, for another example, is using its voice-based navigation and communication services for bringing voice services to General Motors automobiles. By changing the voice type, gender, and script, automakers can extend branding qualities to the voice that gives you directions to the airport.

For all types of businesses, voice portals can offer benefits such as greater user acceptance over IVR systems, reduced call center costs, greater variety of applications, and increased ability to automate more types of customer service. But voice portals offer unique benefits in vertical-market applications.

A recent study conducted by Evans Research shows increasing consumer acceptance of the use of speech recognition across a broad range of applications.
In finance, for example, businesses can expand what IVR systems now provide. Voice portals can let users enter stock names, receive alerts, and act immediately on them. By the addition of specialized terminology, legal professionals can speak in legal terms and health personnel can use medical jargon. These types of voice-driven applications will enable businesses to minimize response times, improve competitiveness, and reduce costs while ensuring quality.

Challenges to success

What hurdles exist to tapping these opportunities? Several technical challenges need to be addressed. Speech-enabled applications are technically complex, requiring powerful host processors and scalable systems. Large subscriber bases require hundreds of ports initially, but to meet the expected demand, systems will need to scale to thousands of ports in the near term. Reliability is another requirement for network-ready systems. As such, those players with carrier-grade infrastructure are taking an early leadership role in voice portals. And finally, the management of systems must provide for remote management and configuration, the control of quality of service, and the management of security.

The same ubiquity that will help drive demand will also present challenges for multiple-language, worldwide deployment. The large installed based of PSTN (public switched telephone network) services will need time and resources to transition to the next-generation network. And service issues of conflicting standards and protocols, dynamic bandwidth management, and security will remain.

TEXT TO SPEECH: VoiceXML promises to act as a bridge between text-based Web content and speech-based applications.
VoiceXML, a scripting language used to design speech applications, is one of the relevant standards for interoperability that will help bridge the telephone and Web content (see the figure). Currently, 100 companies support the VXML specification. Adoption of VoiceXML will accelerate the voice application market. VoiceXML promises to bring Web development and content delivery to voice-response applications, thus easing development and improving time to market. Web server test beds and hosts, integrated VoiceXML browser platforms, easy-to-use APIs (application programming interfaces) and developers who can transfer their skills to voice browsers will move the development of voice-driven applications forward.

Security issues arising out of location-based capabilities are emerging and will need to be addressed. How do you balance the super-convenience of anytime, anywhere access against questions of privacy? How do you achieve Star Trek and not Big Brother?

Evolution of opportunities

The appeal of personalization and the lure of convenience will mean consumers will be using lots of connect time. You’ll retrieve your messages (email, voice, and fax), access a restaurant review, make a reservation, get driving directions, check movie times or buy theater tickets, check stock quotes, and call your friend—all without ever hanging up the telephone or needing to dial more than one number.

Some voice portals are finding innovative ways to generate revenue in addition to the typical Internet advertising model. Potential revenue streams include hosting applications, licensing software for enterprises that want smaller voice portals, and handling voice-communication transactions generated by telephone order-taking or v-commerce.

Personalization makes these services stick. It creates a convenience and familiarity that users are unwilling to give up once they’ve tried it. Telephone companies have witnessed this phenomenon with personalized enhanced services, such as voicemail and speed dialing, which not only increase telephone usage but also become highly profitable services on their own.

The pace of Internet growth has created many unmet business and consumer needs for greater customization of Internet access. And that, in turn, has created more opportunities to do business. The growth of the Internet has also produced a pool of technical resources that can be leveraged for voice-driven applications.

For all this, the communications evolution has just begun. According to MIT’s Dertouzos, all the content in the world, from every source, traditional or computer-based, makes up less than 5 percent of the world’s industrial economy. However, 50 percent, or $10 trillion, is office work—a tiny portion of which, so far, is transacted over the Internet. There’s ample evidence that a voice-enabled Web is the critical next step in tapping this commerce potential. The fusion of wireless trends, the confidence of key business players, and the overcoming of technical hurdles is heralding a new way we’ll be living, working and playing in the future—by picking up a telephone and just talking.

Author information


Bob Schechter is CEO and chairman of NMS Communications (www.nmscommunications.com), which provides technology for today’s high-value communications systems, including voice-driven Web services, packet voice, and broadband-access services.













 

Email Newsletter | Advertising | Privacy Statement | Terms and Conditions | Contact Us  
Copyright © 2000-2008 Cahners Business Information, A Division of Reed Elsevier, Inc.