home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Archive Magazine 1996
/
ARCHIVE_96.iso
/
discs
/
mag_discs
/
volume_9
/
issue_07
/
heardnet
/
StrongSpch
< prev
next >
Wrap
Text File
|
1996-03-02
|
9KB
|
184 lines
<Digital Equipment Corporation>
<Digital Semiconductor>
<IMAGE>
Speech recognition: A performance boost brings it to the mainstream
The Current State of the Market
Until only a short while ago, speech recognition was largely an
immature technology in search of a market. The first systems had to
be trained by the individuals that would use them, and were capable of
identifying only a limited vocabulary of words. From its humble
beginnings recognizing tens of words for command-and-control
applications, the technology has evolved to enable the understanding
of thousands of words in full-fledged speech dictation systems.
Today, speech recognition is making major strides in its ability to
provide information to consumers, dictation capabilities to physicians
and lawyers, and reduced costs for large companies with extensive
customer service units.
Market research firms predict that the industry, which topped $340
million in 1994, is on its way to $1 billion by the end of the
century. An estimated 30 percent of revenue comes from telephone
applications, while the balance comes from speech-to-text products,
data-entry applications, consumer-market applications and
speech-verification products. Speech recognition is beginning to
arrive on the desktop with increasing frequency. As with most
emerging technologies, the market and the products are diverse and
the landscape can be confusing.
Current Technology and Applications
Telephony and speech-to-text applications are in the greatest demand.
At the high end of telephony applications are products being offered
by Nynex, the baby Bells and other telephone carriers. At a lower
level are computer telephony applications such as those based on
interactive speech response. The speech-to-text arena includes what
are often referred to as dictation systems- sophisticated systems that
can recognize more than 50,000 words. Most desktop applications fall
under either command-and-control or data entry product types.
Command-and-control products are basically speech-activated "hot
keys," and data-entry products usually generate forms but are also
based on command-and-control. The technology is basically the same
for all these product types, but it is a technology that lends itself
to great product diversity.
Discrete or Continuous
Further, speech recognition systems can either be discrete, requiring
the user to pause between each utterance ("dial [pause] 5
[pause]...") or continuous, allowing the user to talk
conversationally: "dial 555-1234." Systems based on continuous
recognition are more natural and efficient for users than discrete
systems, but the speech interpretation and analysis is more complex.
Whether a system is discrete or continuous, there are two types of
recognition: speaker-dependent and speaker-independent.
Speaker Dependent or Independent
Speaker-dependent systems allow the use of specific spoken phrases
that are unique to one individual. The user repeats each phrase two
or three times to create a speech model.
As the name implies, speaker-independent systems understand commands
regardless of the speaker: a system can recognize specific words
without prior training. Word models are created from samples of a
broad range of people saying the word, or else developed phonetically
using linguistic methods. Word-based systems offer the advantage of
higher recognition accuracy but are less flexible than
speaker-dependent systems, since many samples must be collected to
build models every time new words are added to the vocabulary.
Future Directions
Several improvements in recognition technology have fueled growth in
speech systems. Better processing and filtering techniques improve
the quality of the speech signal, making recognition more reliable.
Systems are becoming easier to use and more natural with the
incorporation of full hands-free speech control and other speech
technologies.
There is no disputing the fact that speech recognition and telephony
are coming together, which makes sense given lower pricing and the
technology's virtue as a more natural interface. But it's early in the
game and many vendors are still working out the applications. At its
best, speech recognition combined with telephony will be the
equivalent of having a personal assistant, especially useful in the
mobile workforce. The proliferation of phones will likely accelerate
the use of speech recognition in customer service applications.
Vendors also peg the millions of cellular phone users as a hot target
market. It would enable them to keep their hands on the wheel instead
of on a handset.
The cost of incorporating speech recognition in PCs is drastically
lower than in the past, which could bring speechenabled systems
closer to corporate usage. Most audio cards using speech recognition
provide users with a low-cost subset of full-blown large-vocabulary
recognition systems, which usually require a separate add-in card and
cost at least $500. PC companies are also adopting speech recognition
as the technology becomes easier to implement on standard sound cards.
What the End Users Say
A survey at Spring Comdex '95 revealed that nearly two thirds of
respondents believe that speech recognition will be very valuable to
computing applications over the next three years, while another 27%
see it as changing the nature of computing. Further, 83% of those
surveyed said that speech recognition will benefit the average
computer user within five years. Half of these people said within two
years is a more likely estimate.
When Link Resources conducted an end user telephone survey in 1994,
over 70% of mobile executives and professionals said that speech
input would be useful to extremely useful, and 75% of mobile
industrial workers indicated the same.
The Final Hurdles
In a June 1995 review of five dictation packages, The Seybold Report
on Desktop Publishing delineated a list of features that could
represent the last hurdles to speech recognition becoming mainstream:
continuous recognition, both accuracy and "smarts," integration with a
word processor, speech commands for other applications, no hands
required, no training period, easier installation with no new
hardware, less stringent hardware requirements, and affordability.
Further, speech recognition has been limited to desktop systems.
Performance estimates to recognize 5,000 words and phrases indicate
the need for more than 100 Dhrystone2.1 MIPS. This type of
performance has traditionally been added through the use of digital
signal processors (DSP) and associated controller circuitry, which
add to board space and total system cost. Software- based systems
such as Dragon Dictate address manufacturing cost considerations but
require that the processor assume more DSP functions, raising again
the need for increased performance. Since speech recognition has been
identified as a key requirement for mobile workers, being tied to the
desktop represents a significant inhibitor to mainstream adoption.
Digital Semiconductor Offers Enabling Technology
The SA-110, Digital's first StrongARM family member, provides
performance matched only by desktop processors and has power
dissipation levels well below those required for portable
battery-operated handheld products. The SA- 110 is offered in two
speed variants for low power handheld systems. The first operates at
100MHz with an estimated performance rating of 115 Dhrystone2.1 MIPS
and power dissipation of less than 300 mW. A 160MHz version yields
185 MIPS at less than 450 mW of power consumption. While clock rate
is a performance enabler, other StrongARM architectural features
provide significant performance improvements, making the SA-110 an
ideal speech recognition enabler for handheld products.
The 0.35-micron CMOS process allows the cost effective addition of
larger caches, enhancing processsor performance without burdening the
system designer with a requirement for expensive external memory.
This reduces traffic on the bus which improves performance and
minimizes power consumption. Further, the SA-110's high performance
cache architecture and multiplier substantially improve the execution
time of software applications.
The combination of clock rate and architectural features provides
performance improvements enabling computeintensive applications like
Dragon Systems industry leading desktop speech recognition to move to
the portable environment. Digital has already ported its DECtalk
speech synthesis application to the ARM architecture, giving handheld
computing access to hands-free text-to- speech capability. Together
with speech recognition product refinements, the SA-110's higher
performance can enable more rapid acceptance of speech recognition as
a standard technology for handheld computing.
Note:
Digital, Digital Semiconductor, DECtalk and the Digital logo are all
trademarks of Digital Equipment Corporation. IBM is a registered
trademark of International Business Machines Corporation. Dragon
Dictate is a trademark of Dragon Systems, Inc. ARM is a registered
trademark, and StrongARM is a trademark of Advanced RISC Machines,
Ltd. February 1996
Updated: Monday, February 5, 1996
TM