home *** CD-ROM | disk | FTP | other *** search
Text File | 1995-07-25 | 55.8 KB | 1,200 lines |
- Subject: comp.speech Frequently Asked Questions - part 3/3
- Newsgroups: comp.speech,comp.answers,news.answers
- From: andrewh@speech.su.oz.au (Andrew Hunt)
- Date: 10 Nov 1994 01:29:16 GMT
-
- Archive-name: comp-speech-faq/part3
- Last-modified: 1994/11/04
-
-
- COMP.SPEECH FAQ POSTING - PART 3/3
-
-
- [Note: this document has been automatically extracted from
- a WWW site. This may introduce some formatting errors.]
-
-
-
- ===========================================================================
-
-
- FAQ SECTION 5 - Speech Synthesis
-
- Q5.1: WHAT IS SPEECH SYNTHESIS?
-
- Speech synthesis is the task of transforming written input to spoken
- output. The input can either be provided in a graphemic/orthographic
- or a phonemic script, depending on its source.
- _________________________________________________________________
-
- Q5.2: HOW CAN SPEECH SYNTHESIS BE PERFORMED?
-
- There are several algorithms. The choice depends on the task they're
- used for. The easiest way is to just record the voice of a person
- speaking the desired phrases. This is useful if only a restricted
- volume of phrases and sentences is used, e.g. messages in a train
- station, or schedule information via phone. The quality depends on the
- way recording is done.
-
- More sophisticated but worse in quality are algorithms which split the
- speech into smaller pieces. The smaller those units are, the less are
- they in number, but the quality also decreases. An often used unit is
- the phoneme, the smallest linguistic unit. Depending on the language
- used there are about 35-50 phonemes in western European languages,
- i.e. there are 35-50 single recordings. The problem is combining them
- as fluent speech requires fluent transitions between the elements. The
- intellegibility is therefore lower, but the memory required is small.
-
- A solution to this dilemma is using diphones. Instead of splitting at
- the transitions, the cut is done at the center of the phonemes,
- leaving the transitions themselves intact. This gives about 400
- elements (20*20) and the quality increases.
-
- The longer the units become, the more elements are there, but the
- quality increases along with the memory required. Other units which
- are widely used are half-syllables, syllables, words, or combinations
- of them, e.g. word stems and inflectional endings.
- _________________________________________________________________
-
- Q5.3: WHAT ARE SOME GOOD REFERENCES/BOOKS ON SYNTHESIS?
-
- The following are good introductory books/articles.
- * Douglas O'Shaughnessy -- Speech Communication: Human and Machine
- Addison Wesley series in Electrical Engineering: Digital Signal
- Processing, 1987.
- * D. H. Klatt, "Review of Text-To-Speech Conversion for English",
- Jnl. of the Acoustic Society of America (JASA), v82, Sept. 1987,
- pp 737-793.
- * I. H. Witten. Principles of Computer Speech. (London: Academic
- Press, Inc., 1982).
- * John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to
- Speech: The MITalk System", Cambridge University Press, 1987.
-
- _________________________________________________________________
-
- Q5.4: WHAT SPEECH SYNTHESIS SOFTWARE/HARDWARE IS AVAILABLE?
-
- Please email any updates, corrections or additions to the following
- list. The range of commercially available synthesis software is
- growing rapidly so any help in keeping up to date will be appreciated.
-
- Orator Text-to-Speech Synthesizer
- * Platform: SUN SPARC, Decstation 5000. Portable to other UNIX
- platforms.
- * Description: Sophisticated speech synthesis package. Has text
- preprocessing (for abbreviations, numbers), acronym citation
- rules, and human-like spelling routines. High accuracy for
- pronunciation of names of people, places and businesses in
- America, text-to-speech translation for common words; rules for
- stress and intonation marking, based on natural-sounding
- demisyllable synthesis; various methods of user control and
- customization at most stages of processing. Currently, ORATOR is
- most appropriate for applications containing a large component of
- names in the text, and requires some amount of user- specified
- text-preprocessing to produce good quality speech for general
- text.
- * Hardware: Standard audio output of SPARC, or Decstation audio
- hardware. At least 16M of memory recommended.
- * Cost: Binary License: $5,000. Source license for porting or
- commercial use: $30,000.
- * Availability: Contact Bellcore's Licensing Office
- (1-800-527-1080) or email John Zilg jzilg@cc.bellcore.com
-
- Text to phoneme program (1)
- * Platform: unknown
- * Description: Text to phoneme program. Based on Naval Research
- Lab's set of text to phoneme rules.
- * Availability: by anonymous ftp
- + ftp://shark.cse.fau.edu/pub/src/phon.tar.Z
-
- Text to phoneme program (2)
- * Platform: unknown
- * Description: Text to phoneme program.
- * Availability: by anonymous ftp
- + ftp://wuarchive.wustl.edu/mirrors/unix-c/utils/phoneme.c
-
- Text to phoneme program (3)
- * Description: A public domain version of the same Naval Research
- Lab text to phoneme rules.
- * Availability: By anonymous ftp
- +
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/english2ph
- oneme.shar
-
- Text to speech program
- * Description: A implementation of the Klatt phoneme to waveform
- speech synthesiser.
- * Availability: By anonymous ftp
- +
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/klatt-0.02
- .tar.Z
-
- "Speak" - a Text to Speech Program
- * Platform: Sun SPARC
- * Description: Text to speech program based on concatenation of
- pre-recorded speech segments. A function library can be used to
- integrate speech output into other code.
- * Hardware: SPARC audio I/O
- * Availability: by anonymous ftp
- + ftp://wilma.cs.brown.edu/pub/speak.tar.Z
-
- TheBigMouth - a Text to Speech Program
- * Platform: NeXT
- * Description: Text to speech program based on concatenation of
- pre-recorded speech segments. NeXT equivalent of "Speak" for Suns.
- * Availability: try NeXT archive sites such as
- sonata.cc.purdue.edu.
-
- TextToSpeech Kit
- * Platform: NeXT Computers
- * Description: The TextToSpeech Kit does unrestricted conversion
- of English text to synthesized speech in real-time. The user has
- control over speaking rate, median pitch, stereo balance, volume,
- and intonation type. Text of any length can be spoken, and
- messages can be queued up, from multiple applications if desired.
- Real-time controls such as pause, continue, and erase are
- included. Pronunciations are derived primarily by dictionary
- look-up. The Main Dictionary has nearly 100,000 hand-edited
- pronunciations which can be supplemented or overridden with the
- User and Application dictionaries. A number parser handles numbers
- in any form. A letter-to-sound knowledge base provides
- pronunciations for words not in the Main or customized
- dictionaries. Dictionary search order is under user control.
- Special modes of text input are available for spelling and
- emphasis of words or phrases. The actual conversion of text to
- speech is done by the TextToSpeech Server. The Server runs as an
- independent task in the background, and can handle up to 50 client
- connections.
- * Misc: The TextToSpeech Kit comes in two packages: the Developer
- Kit and the User Kit. The Developer Kit enables developers to
- build and test applications which incorporate text-to-speech. It
- includes the TextToSpeech Server, the TextToSpeech Object, the
- pronunciation editor PrEditor, several example applications,
- phonetic fonts, example source code, and developer documentation.
- The User Kit provides support for applications which incorporate
- text-to-speech. It is a subset of the Developer Kit.
- * Hardware: Uses standard NeXT Computer hardware.
- * Cost:
- + TextToSpeech User Kit: $175 CDN ($145 US)
- + TextToSpeech Developer Kit: $350 CDN ($290 US)
- + Upgrade from User to Developer Kit: $175 CDN ($145 US)
- * Availability: Trillium Sound Research
- 1500, 112 - 4th Ave. S.W., Calgary, Alberta, Canada, T2P 0H3
- Tel: (403) 284-9278 Fax: (403) 282-6778
- Order Desk: 1-800-L-ORATOR (US and Canada only)
- Email: TTSInfo@trillium.ab.ca
-
- SGI Developers Toolbox Synthesiser
- * Platform: SGI
- * Description: The SGI Developer Toolbox 4.0 CDROM contains a
- basic public domain text-to-speech program in the publics/speak
- directory. The directory includes man pages and source.
- * Availability: on the SGI Developer Toolbox 4.0 CDROM
-
- rsynth
- * Platform: Various (including Sun, Linux, NeXT, SGI)
- * Description: Text-to-speech converter produced by combination of
- various public-domain pieces.
- * Price: Free
- * Availability: by anonymous ftp from
- +
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/rsynth-1.0
- .tar.Z
- +
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/rsynth-1.0
- .tar.gz
-
- SENSYN speech synthesizer
- * Platform: PC, Mac, Sun, and NeXt
- * Rough Cost: $300
- * Description: This formant synthesizer produces speech waveform
- files based on the (Klatt) KLSYN88 synthesizer. It is intended for
- laboratory and research use. Note that this is NOT a
- text-to-speech synthesizer, but creates speech sounds based upon a
- large number of input variables (formant frequencies, bandwidths,
- glottal pulse characteristics, etc.) and would be used as part of
- a TTS system. Includes full source code.
- * Availability: Sensimetrics Corporation
- 64 Sidney Street, Cambridge MA 02139.
- Fax: (617) 225-0470; Tel: (617) 225-2442.
- Email: sensimetrics@sens.com
-
- spchsyn.exe
- * Platform: PC?
- * Availability: By anonymous ftp as a self extracting DOS archive.
- +
- ftp://evans.ee.adfa.oz.au/mirrors/tibbs/applications/spchsyn.e
- xe
- * Requirements: May require special TI product(s), but all source
- is there.
-
- CSRE: Canadian Speech Research Environment
- * Platform: PC
- * Cost: Distributed on a cost recovery basis.
- * Description: CSRE is a software system which includes in
- addition to the Klatt speech synthesizer, SPEECH ANALYSIS and
- EXPERIMENT CONTROL SYSTEM. A paper about the whole package can be
- found in:
- + Jamieson D.G. et al, "CSRE: A Speech Research Environment",
- Proc. of the Second Intl. Conf. on Spoken Language
- Processing, Edmonton: University of Alberta, pp. 1127-1130.
- * Hardware: Can use a range of data aqcuisition/DSP hardware.
- * Availability: For more information contact
- Krystyna Marciniak
- email march@uwovax.uwo.ca
- Tel (519) 661-3901 Fax (519) 661-3805.
- For technical information email ramji@uwovax.uwo.ca
- * Note: A more detailed description is given in Section 1.9 on
- speech environments.
-
- Eloquence (currently an alpha release)
- * Platform: Windows and Solaris
- * Description: Software based text-to-speech package. Generates
- waveforms completely algorithmically instead of by concatenating
- waveforms, for maximum flexibility and naturalism. For instance,
- when the user requests a deeper voice, the software simulates a
- larger vocal tract, instead of simply pitch-shifting samples.
-
- Uses high-level linguistic parsing, which obviates the need for a
- huge dictionary. Handles numbers, acronyms, currency, etc.
- Includes a set of annotation symbols, for placing stress on
- particular words, expressing excitement/boredom, etc. Also allows
- phonetic input. The final version, including support for Windows
- DDE and OLE and UNIX Sockets, will be released by the end of 1994.
-
- Produces male and female voices for General American English.
- Dialects under development include Alabama, Brooklyn, and Boston.
- * Price: $5000 (unconfirmed)
- * Availability:
- Eloquent Technology, Inc.
- 2389 North Triphammer Road
- Ithaca, NY 14850
- Ph: (607) 607-266-7025 Fax: (607) 607-266-7030
- Email: eti@plab.dmll.cornell.edu
-
- JSRU
- * Platform: UNIX and PC
- * Cost: 100 pounds sterling (from academic institutions and
- industry)
- * Description: A C version of the JSRU system, Version 2.3 is
- available. It's written in Turbo C but runs on most Unix systems
- with very little modification. A Form of Agreement must be signed
- to say that the software is required for research and development
- only.
- * Contact: Dr. E.Lewis eric.lewis@uk.ac.bristol)
-
- Klatt-style synthesiser
- * Platform: Unix
- * Cost: Free
- * Description: Software posted to comp.speech in late 1992.
- * Availability: By anonymous ftp from the comp.speech archives
- +
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources/klatt-0.02
- .tar.Z
-
- DECTalk
- * Description: Speech synthesis hardware and software. Detailed
- information on DECtalk and other DEC products is available on a
- World-Wide Web site.
- + http://www.digital.com/info.html
- For specific information on DECtalk, check out this www url:
- +
- http://www.digital.com/archive/pub/Digital/info/Customer-Updat
- e/940620005.txt
-
- Speech Manager and PlainTalk
- * Platform: Macintosh
- * Cost: Free
- * Description: Apple's new text-to-speech system extension(s) that
- enable applications (listed below) to perform text-to-speech
- conversion. The Speech Manager runs on most Macs, but PlainTalk
- (and the high quality voices) requires a 68020 Mac or better.
- * Availability: By anonymous ftp from:
- + ftp://ftp.apple.com/dts/mac/sys.soft/speech
- There are 3 files in this directory:
-
- 6273632 Aug 14 22:51 macintalk-pro.hqx
- PlainTalk Text-To-Speech 1.0 speech synthesizer extension
- (includes Female Voice, Compressed); TTS Female Voice;
- TTS Male Voice; and TTS Male Voice, Compressed. Requires
- 68020 or better!
-
- 370108 Aug 13 04:30 speech-manager-docs.hqx
- Apple DocViewer format (Inside Macintosh style, no
- installation instructions - just drag everything onto
- your closed System Folder).
-
- 262569 Aug 7 07:01 speech-manager.hqx
- Speech Manager 1.1.1 (includes Marvin's voice) and
- MacInTalk Voices 1.1.1 (9 more voices). Runs most Macs.
-
- Various Mac Speech Output Applications
- * Platform: Macintosh
- * Cost: Free (except for At Ease)
- * Description: Some of the Speech Manager aware text-to-speech
- (TTS) applications, etc. are listed below (there are more on the
- Apple Developer CD-ROMs).
- Application, etc. Source Comments
- _________________ ________ _________________________________________________
- AddressSpeech info-mac 4D talking address book (from Speech Pack 2.0)
- At Ease 2.0 MacWarehouse Friendly desktop that speaks file names
- At Ease 2.0 WG MacWarehouse Friendly desktop that speaks file names
- Eliza 3.1 AOL Talking Eliza (Rogerian psych therapist)
- FB speech Inside Basic Mag, volume 3, no. 6. FutureBasic demo
- FB Speech demo Inside Basic Mag, volume 3, no. 7. FutureBasic demo
- Fortune 1.1 info-mac Like a talking UNIX fortune command - slick
- Homer 0.92d9 zaphod.ee.pitt.edu GUI IRC client, assign nicks voices - slick
- MacMessage 1.0 FirstClassBBS Share talking messages/customizable startup
- Say info-mac MPW Tool which converts standard input to speech
- ScriptTools 1.2 info-mac Write AppleScript scripts to say text messages
- Siege Watch 1.01f info-mac Wryly political speaking clock
- SoToSpeak1.0.0b10 info-mac Two voice conversation (also see Fortune's About)
- Speak It! info-mac Type in a message and have it spoken
- Speaker 1.11 info-mac Simple text file editor, speaks on CR, macros
- Speecher 1.2.1 info-mac Customizable word pronunciation/substitution
- SpeechManagerdemo info-mac Command line interface, C source, aka -explorer
- Speech Pack 2.0 info-mac 4th Dimension external, add speech to database
- SpeechUnitEx info-mac Pascal source code for speech in Lab 7
- speek-02b info-mac Speech XCMD for HyperCard
- TalkingClockPro2.0info-mac AppleScriptable talking clock extension (2.0b0)
- TeachText 7.2 AV Mac Apple's talking TeachText (simple editor w/QT)
- Tex-Edit 1.9 AOL Talking word processor, McSink like, modeming
- VoiceDemo 1.0.1 info-mac Bare bones phrase talker
- Welcome!v1.3.1 info-mac A talking Welcome to Macintosh startup
- ? ? Talking Plug-In-Module for MS Word 5,
- experimental, unsupported, buggy, beware!
- Speech Rhythms AOL A cool text file for one of the above apps
- _____
- * Sources:
- + AOL = America Online
- + info-mac = {ftp sumex-aim.stanford.edu, ftp
- wuarchive.wustl.edu, et al.}
- + MacWarehouse = (800) 255-6227
- * Misc: Apple's work in spoken language technologies and systems
- is described in:
- + Lee, Kai-Fu. "The Conversational Computer: An Apple
- Perspective." (Keynote Speech) In Proc. Eurospeech in Berlin,
- September, 1993.
-
- MacinTalk
- * Platform: Macintosh
- * Cost: Free
- * Description: Formant based speech synthesis. There is also a
- program called "tex-edit" which apparently can pronounce English
- sentences reasonably using Macintalk.
- * Note: MacinTalk doesn't run reliably on Macintosh's with new
- sound hardware under the lastest OS (System 7.1 w/HUD 2.0). More
- recent software is listed above.
- * Availability: By anonymous ftp from many archive sites (have a
- look on archie if you can). tex-edit is on many of the same sites.
- Try
- +
- ftp://wuarchive.wustl.edu/mirrors2/info-mac/Old/card/macintalk
- .hqx
- +
- ftp://wuarchive.wustl.edu/mirrors2/info-mac/Old/card/macintalk
- -stack.hqx
- +
- ftp://wuarchive.wustl.edu/mirrors2/info-mac/app/tex-edit-15.hq
- x
-
- Monologue by Creative Labs
- * Platform: PC Windows plus SoundBlaster 16
- * Cost: $99.00 or free with some MultiMedia packages
- * Description: Phoneme based speech synthesis software which
- provides output on Sound Blaster compatible audio cards. It
- includes a dictionary of words that are "exceptions" together with
- a a dictionary manager for modifying those words. It can be used
- as a stand alone program with Windows' Clipboard or as a DDE
- server dynamically linked (DLL) to a program you write.
- * Cost: $99.00 or free with some MultiMedia packages
- * Contact:
- Creative Labs Inc.
- 1901 McCarthy Boul, Milpitas, CA 95035, USA
- Tel: 408-428-6622 Fax: 408-428-6633 BBS: 408-428-6660
- OR Creative Technology Ltd.
- 67 Ayer Rajah Crescent #03-18, Singapore 0513
- Tel: 65-870-0433 Fax: 65-773-0353 BBS: 65-776-2423
-
- Lernout & Hauspie Text-To-Speech SDK
- * Platform: IBM-Compatible
- * Description: The L&H; Text-to-Speech software developers kit is
- able to integrate text-to-speech technology with your own or
- existing PC applications under Microsoft Windows 3.1. This
- software will allow conversion of written text into clear human
- sounding synthetic speech.
- * Requirements: IBM-compatible PC 386 DX(33Mhz) or higher, 8Mb
- RAM, MS DOS 5.0(or higher), MS Windows 3.1 (or higher), Compiler
- and linker: Microsoft(R) Visual C++ or Borland C++, Windows(TM)
- 3.1 compatible sound card, preferably 16 bit e.g. Soundblaster,
- Windows Sounds System, Pro Audio Spectrum
- * Price: Unconfirmed $1,999 per copy, and $499 per each additional
- language (American English, French, German, or Spanish).
- * Contact: USA (617) 932-4118
-
- Tinytalk
- * Platform: PC
- * Description: Shareware package is a speech 'screen reader' which
- is used by many blind users.
- * Availability: By anonymous ftp
- + ftp://handicap.shel.isc-br.com/speech
- Get the files ttexe166.zip and ttdoc166.zip.
-
- Narrator - narrator.device
- * Platform: Amiga
- * Description: Formant based speech synthesis. Includes a
- Engish-to-phoneme translation library, and a SPEAK: pseudo-device
- for speech output.
- * Hardware: Standard Amiga hardware
- * Availability: Part of AmigaOS
-
- Infovox Product Range
- * Description: Multilingual Text-to-speech systems, languages
- available: American English, British English, German, French,
- Spanish, Italian, Swedish, Norwegian, Icelandic, Danish and
- Finnish.
-
- * Product name: INFOVOX 500, PC BOARD
- + Product description: Half length expansion board for IBM PC,
- XT, AT, PS/2 model 30 or compatible personal computers. The
- board can also be connected via the serial port. Language and
- control program for downloading into RAM or mounted on
- EPROMs.
- + Platform: for IBM PC, XT, AT, PS/2 model 30 or compatible
- * Product name: INFOVOX 600, OEM BOARD
- + Product description: OEM board built with CMOS IC's. Language
- and control program are stored in on-board fixed memory.
- + Platform: any, Interface: 9-pole D-SUB (RS 232-C) 300-9600
- Baud
- * Product name: INFOVOX 700, DESKTOP UNIT
- + Product description: Desktop unit with built in Infovox 600
- to be connected to any computer or terminal via an RS 232-C
- serial interface. Built in loudspeaker and rechargable
- battery for 4 hours use, and control knobs for continuous
- control of speech volume and speed.
- + Platform: any
- * Product name: INFOVOX 650, OEM BOARD
- + Product description: OEM-board built with CMOS IC's. Language
- and control program are stored in on-board memory.
- + Platform:any, Interface: 9 pole D-SUB (RS 232-C) 300-9600
- Baud
- * Product name: INFOVOX 750, DESKTOP UNIT
- + Product description: Desktop unit with built in Infovox 650
- to be connected to any computer or terminal via an RS 232-C
- serial interface. Built in loudspeaker and rechargable
- battery for 5 hours use, and a control knob for continuous
- control of speech volume.
- + Platform: any
- * Misc: Infovox multi-lingual Text-to-Speech Technologies can
- interface with Apple's PlainTalk System. It enables Apple Third
- party developers to write application software with synthetic
- speech output using their usual Apple Plain Talk Text-to-Speech
- interface. Software already written for the English speaking
- market using Apple Plain Talk can be now distributed worldwide,
- provided message strings are translated.
- * Contact:
- Telia Promotor Infovox AB
- TTS Sales Division
- P.O. Box 2069
- S-171 02 Solna, Sweden
- Ph: +46 8 764 35 00 Fax: +46 8 735 78 76
- email: tts-sales@infovox.se
-
- SIMTEL-20
- * The following is a list of speech related software available from
- SIMTEL-20 and its mirror sites for PCs.
- * The SIMTEL internet address is WSMR-SIMTEL20.Army.Mil
- [192.88.110.20] Try looking at your nearest archive site first.
- Directory PD1: MSDOS.VOICE
- Filename Type Length Date Description
- ==============================================
- AUTOTALK.ARC B 23618 881216 Digitized speech for the PC
- CVOICE.ARC B 21335 891113 Tells time via voice response on PC
- HEARTYPE.ARC B 10112 880422 Hear what you are typing, crude voice synth.
- HELPME2.ARC B 8031 871130 Voice cries out 'Help Me!' from PC speaker
- SAY.ARC B 20224 860330 Computer Speech - using phonemes
- SPEECH98.ZIP B 41003 910628 Build speech (voice) on PC using 98 phonemes
- TALK.ARC B 8576 861109 BASIC program to demo talking on a PC speaker
- TRAN.ARC B 39766 890715 Repeats typed text in digital voice
- VDIGIT.ZIP B 196284 901223 Toolkit: Add digitized voice to your programs
- VGREET.ARC B 45281 900117 Voice says good morning/afternoon/evening
-
- _________________________________________________________________
-
-
- ===========================================================================
-
-
- FAQ SECTION 6 - Speech Recognition
-
- Q6.1: WHAT IS SPEECH RECOGNITION?
-
- Automatic speech recognition is the process by which a computer maps
- an acoustic speech signal to text.
-
- Automatic speech understanding is the process by which a computer maps
- an acoustic speech signal to some form of abstract meaning of the
- speech.
- _________________________________________________________________
-
- Q6.2: HOW CAN I BUILD A VERY SIMPLE SPEECH RECOGNISER?
-
- Doug Danforth provides a detailed account in article 253 in the
- comp.speech archives. A summary is provided below. It is also
- available by anonymous ftp
- *
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechRecognit
- ion
-
- QUICKY RECOGNIZER sketch:
-
- Here is a simple recognizer that should give you 85%+ recognition
- accuracy. The accuracy is a function of the words you have in your
- vocabulary. Long distinct words are easy. Short similar words are
- hard. You can get 98+% on the digits with this recognizer.
-
- Overview:
- * Find the begining and end of the utterance.
- * Filter the raw signal into frequency bands.
- * Cut the utterance into a fixed number of segments.
- * Average data for each band in each segment.
- * Store this pattern with its name.
- * Collect training set of about 3 repetitions of each pattern
- (word).
- * Recognize unknown by comparing its pattern against all patterns in
- the training set and returning the name of the pattern closest to
- the unknown.
-
- Many variations upon the theme can be made to improve the performance.
- Try different filtering of the raw signal and different processing
- methods.
-
- Q6.7 contains information on public domain speech recognition
- software: Lotec and Myers' Hidden Markov Model software.
- _________________________________________________________________
-
- Q6.3: WHAT DOES SPEAKER DEPENDENT/ADAPTIVE/INDEPENDENT MEAN?
-
- A speaker dependent system is developed to operate for a single
- speaker. These systems are usually easier to develop, cheaper to buy
- and more accurate, but not as flexible as speaker adaptive or speaker
- independent systems.
-
- A speaker independent system is developed to operate for any speaker
- of a particular type (e.g. American English). These systems are the
- most difficult to develop, most expensive and accuracy is lower than
- speaker independent systems. However, they are more flexible.
-
- A speaker adaptive system is developed to adapt its operation to the
- characteristics of new speakers. It's difficulty lies somewhere
- between speaker independent and speaker dependent systems.
- _________________________________________________________________
-
- Q6.4: WHAT DOES SMALL/MEDIUM/LARGE/VERY-LARGE VOCABULARY MEAN?
-
- The size of vocabulary of a speech recognition system affects the
- complexity, processing requirements and the accuracy of the system.
- Some applications only require a few words (e.g. numbers only), others
- require very large dictionaries (e.g. dictation machines). There are
- no established definitions, however, try
- * small vocabulary - tens of words
- * medium vocabulary - hundreds of words
- * large vocabulary - thousands of words
- * very-large vocabulary - tens of thousands of words.
-
- _________________________________________________________________
-
- Q6.5: WHAT DOES CONTINUOUS SPEECH OR ISOLATED-WORD MEAN?
-
- An isolated-word system operates on single words at a time - requiring
- a pause between saying each word. This is the simplest form of
- recognition to perform because the end points are easier to find and
- the pronunciation of a word tends not affect others. Thus, because the
- occurrences of words are more consistent they are easier to recognise.
-
- A continuous speech system operates on speech in which words are
- connected together, i.e. not separated by pauses. Continuous speech is
- more difficult to handle because of a variety of effects. First, it is
- difficult to find the start and end points of words. Another problem
- is "coarticulation". The production of each phoneme is affected by the
- production of surrounding phonemes, and similarly the the start and
- end of words are affected by the preceding and following words. The
- recognition of continuous speech is also affected by the rate of
- speech (fast speech tends to be harder).
- _________________________________________________________________
-
- Q6.6: HOW IS SPEECH RECOGNITION PERFORMED?
-
- A wide variety of techniques are used to perform speech recognition.
- There are many types of speech recognition. There are many levels of
- speech recognition / analysis / understanding.
-
- Typically speech recognition starts with the digital sampling of
- speech. The next stage is acoustic signal processing. Most techniques
- include spectral analysis; e.g. LPC analysis, MFCC, cochlea modelling
- and many, many more.
-
- The next stage is recognition of phonemes, groups of phonemes and
- words. This stage can be achieved by many processes such as DTW
- (Dynamic Time Warping), HMM (hidden Markov modelling), NNs (Neural
- Networks), expert systems and combinations of techniques. HMM-based
- systems are currently the most commonly used and most successful
- approach.
-
- Most systems utilise some knowledge of the language to aid the
- recognition process.
-
- Some systems try to "understand" speech. That is, they try to convert
- the words into a representation of what the speaker intended to mean
- or achieve by what they said.
- _________________________________________________________________
-
- Q6.7: WHAT ARE SOME GOOD REFERENCES/BOOKS ON SPEECH RECOGNITION?
-
- Some reviews of speech recognition for personal computers:
- * "Seybold Report on Desktop Publishing" published a nine-page,
- head-to-head comparison of Dragon's DOS software with IBM's OS/2
- software. March 7, 1994; Volume 8, Number 7; Pages 3-11;
- ISSN:0889-9762; Seybold Publications, P.O. Box 644, Media, PA
- 19063 USA, phone (610) 565-2480.
- * McGraw-Hill Inc.'s "BYTE, the Magazine of Technology Integration,"
- published a two-page review of IBM's Personal Dictation System
- software. May 1994; Volume ?, Number ?; Pages 145-146;
- ISSN:0360-5280; Editorial, Executive, and Circulation address: One
- Phoenix Mill Lane, Peterborough, NH 03458 USA, phone ?
-
- Some general introduction books on speech recognition technology:
- * Fundamentals of Speech Recognition; Lawrence Rabiner & Biing-Hwang
- Juang Englewood Cliffs NJ: PTR Prentice Hall (Signal Processing
- Series), c1993 ISBN 0-13-015157-2
- * Speech recognition by machine; W.A. Ainsworth London: Peregrinus
- for the Institution of Electrical Engineers, c1988
- * Speech synthesis and recognition; J.N. Holmes Wokingham: Van
- Nostrand Reinhold, c1988
- * Douglas O'Shaughnessy -- Speech Communication: Human and Machine
- Addison Wesley series in Electrical Engineering: Digital Signal
- Processing, 1987.
- * Electronic speech recognition: techniques, technology and
- applications edited by Geoff Bristow, London: Collins, 1986
- * Readings in Speech Recognition; edited by Alex Waibel & Kai-Fu
- Lee. San Mateo: Morgan Kaufmann, c1990
-
- More specific books/articles:
- * Hidden Markov models for speech recognition; X.D. Huang, Y. Ariki,
- M.A. Jack. Edinburgh: Edinburgh University Press, c1990
- * Automatic speech recognition: the development of the SPHINX
- system; by Kai-Fu Lee; Boston; London: Kluwer Academic, c1989
- * Prosody and speech recognition; Alex Waibel (Pitman: London)
- (Morgan Kaufmann: San Mateo, Calif) 1988
- * S. E. Levinson, L. R. Rabiner and M. M. Sondhi, "An Introduction
- to the Application of the Theory of Probabilistic Functions of a
- Markov Process to Automatic Speech Recognition" in Bell Syst.
- Tech. Jnl. v62(4), pp1035--1074, April 1983
- * R. P. Lippmann, "Review of Neural Networks for Speech
- Recognition", in Neural Computation, v1(1), pp 1-38, 1989.
-
- _________________________________________________________________
-
- Q6.8: WHAT SPEECH RECOGNITION PACKAGES ARE AVAILABLE?
-
- The following packages are presented in no particular order.
-
- HM2007 - Speech Recognition Chip
- * Description: HM2007 is a 48-pin single chip CMOS voice
- recognition LSI circuit with on-chip analog front end, voice
- analysis, recognition process and system control functions. A 40
- word isolated-word voice recognition system can be composed of an
- external microphone, keyboard, SRAM and a few other components.
- When combined with a microprocessor, an intelligent recognition
- system can be built. A demo board for this chip is being
- distributed by The Summa Group.
- * Cost: Approx US$30 for the HM2007 and US$100 for the demo board.
- * Contact:
- The Summa Group Limited
- One California Street, Suite #1940,
- San Francisco, CA 94111
- Ph: (415) 288-0390
-
- Voice Blaster Ver. 4.0
- * Platform: IBM AT or higher, DOS or Wndows 3.1
- * Description: Uses a Sound Blaster or compatible board. Contains
- a microphone headset and a connector for LPT1:. A printer can
- still be used on LPT1:. Will recognize 1024 words that are trained
- by the operator. Each word activates a macro that can enter an
- ascii word on the screen or into a word processor or invoke a
- batch file. An optional footswitch may be installed. Software to
- run under DOS or Windows 3.1 is included.
- * Cost: Around $150 Canadian.
- * Contact:
- COVOX Inc.
- 675 Conger Street
- Eugene, Oregon, 97402, USA
- Ph: (503) 342-1271 Fax: (503) 342-1283
- BBS: (503) 342-4135
-
- Votan
- * Platform: MS-DOS, SCO UNIX
- * Description: Isolated word and continuous speech modes, speaker
- dependant and (limited) speaker independent. Vocab size is 255
- words or up to a fixed memory limit - but it is possible to
- dynamically load different words for effectively unlimited number
- of words.
- * Rough Cost: Approx US $1,000-$1,500
- * Requirements: Cost includes one Votan Voice Recognition ISA-bus
- board for 386/486-based machines. A software development system is
- also available for DOS and Unix.
- * Misc: Up to 8 Votan boards may co-exist for 8 simultaneous voice
- users. A telephone interface is also available. There is also a
- 4GL and a software development system. Apparently there is more
- than one version - more info required.
- * Contact: 800-877-4756, 510-426-5600
-
- Entropic's HTK (HMM Toolkit)
- * Platform: Range of Unix platforms.
- * Description: HTK is a software toolkit for building continuous
- density HMM based speech recognisers. It consists of a number of
- library modules and a number of tools. Functions include speech
- analysis, training tools, recognition tools, results analysis, and
- an interactive tool for speech labelling. Many standard forms of
- continuous density HMM are possible. Can perform isolated word or
- connected word speech recognition. It van model whole words, sub-
- word units. Can perform speaker verification and other pattern
- recognition work using HMMs. HTK is now integerated with the
- ESPS/Waves speech research environment which is described in
- Section 1.8.
- * Misc: The availability of HTK changed in early 1993 when
- Entropic obtained exclusive marketing rights to HTK from the
- developers at Cambridge.
- * Cost: On request.
- * Contact:
- Entropic Research Laboratory,
- 600 Pennsylvania Ave, S.E. Suite 202,
- Washington, D.C. 20003, USA
- Phone: (202) 547-1420.
- email - info@wrl.epi.com
-
- DragonDictate version 3.0
- * Platform: PC
- * Description: Speaker-adaptive recognition system for discrete
- speech. Provides 110,000 word dictionary and also allows user to
- add words. Active vocabulary of 5,000, 30,000, or 60,000 words.
- Allows dictation into almost all DOS applications (word
- processors, spreadsheets, etc.) and hands-free operation of the
- PC.
- * Cost:Prices including audio board and high-quality headset
- microphone:
- + US$695 (5,000 word Starter Edition)
- + US$995 (30,000 word Classic Edition)
- + US$1,995 (60,000 word Power Edition)
- * Requirements: Minimum of 33 Mhz 486 with 8-16M memory and at
- least 29M disk space (depending on product), one 8-bit slot, DOS
- 5.0 and up (also runs in a DOS box under Windows or OS/2).
- * Contact:
- Dragon Systems, Inc.
- 90 Bridge Street,
- Newton MA 02158, USA
- Tel: 1-617-965-5200, Fax: 1-617-527-0372
-
- DragonDictate for Windows
- * Platform: PC
- * Description: Speech-to-text dictation system. Discrete speech;
- speaker- adaptive. Also provides command/control and mouse
- movement for hands-free operation of Windows. Comes with a 120,000
- word pronunciation dictionary; users can also add their own words
- or phrases. Dictate directly into any application.
- * Rough Cost:Prices including software, documentation and
- microphone:
- + DragonDictate Starter Edition (5,000 words active) -- $395
- + DragonDictate Classic Edition (30,000 words active) -- $695
- + DragonDictate Power Edition (60,000 words active) -- $1,695
- * Requirements: 486/33, 7-10 MB dedicated RAM (depending on
- edition), Windows 3.1 or later. Supported sound boards: Media
- Vision Pro Audio Studio 16, Creative Labs Sound Blaster 16,
- Microsoft Windows Sound System, IBM Audio Capture/Playback
- Adapter.
- * Contact:
- Dragon Systems, Inc.
- 320 Nevada Street
- Newton, MA 02160, USA
- Phone: (617)965-5200 Fax: (617)527-0372
-
- DragonVoiceTools
- * Platform: PC
- * Description: Programmer's toolkit for developing speech-aware
- DOS or Windows applications. Recognizes continuously spoken digits
- and discretely spoken words or phrases. Up to 1,000 words can be
- active at one time. Use words from 110,000 word dictionary
- (included) and/or develop your own word models.
- * Cost:
- + US$1,995 (developer's kit)
- + US$595 (end-user system)
- * Requirements: Minimum of 20 Mhz 386 (larger vocabulary requires
- faster processor) with at least 5M memory and at least 19M disk
- space (depending on vocabulary size), DOS 5.0 and up, Windows 3.1
- and up, Borland C or C++ or Microsoft C or C++. Also requires IBM
- M-ACPA card available from IBM or Dragon Systems ($325).
- * Contact:
- Dragon Systems, Inc.
- 90 Bridge Street, Newton MA 02158, USA
- Tel: 1-617-965-5200, Fax: 1-617-527-0372
-
- IBM Personal Dictation System
-
- OR: Osborne Personal Dictation System (in Australia)
- * Platform: Intel I486 & IBM OS/2
- * Description: Independent Speaker, discrete speech dictation with
- navigation. Navigation does not require setup, most applications
- are automatically speech enabled by dynamic control analysis.
- Dictation averages 70WPM with 95% accuracy and uses statistical
- trigram modelling. The base system is 22K words, other
- vocabularies available for specific industries.
- * Requirements: 486SX or above, 16MB Ram, 30MB File space,
- Dictation Adapter
- * Cost: Software $495 (includes mic) / Hardware $495
- * Misc: Based on IBM Tangora Technology
- * Availability: US English. Other languages (UK, FR, GR, IT, and
- ES) available 3Q94.
- * Contact: US Contact 1-800-TALK-2-ME or 1-914-766-9252.
-
- VoiceServer for Windows
- * Platform: PC
- * Description: Speaker dependent, each with an independent
- directory. Isolated word. Upto 1000 words/user, 300 words/window.
- 1 word occupies 2Kb on hard disk. Can be used to control Windows
- applications by issuing voice commands instead of menu selection.
- * Rough Cost: 292 Pounds(UK)
- * Requirements: None
- * Misc: Price includes a half-sized AT voice card (including a
- DSP), software, documentation & a microphone (attachable to
- keyboard or speaker). A light-weight high-spec headset is an
- optional extra.
- * Contact:
- Mark Redwood
- Applied Voice Technologies
- 26 Danbury Street, Islington,
- London, UK, N1 8JU
- Ph: + 44 71 454 1224 : Fax: + 44 71 454 1225
-
- IN3 Voice Command for Windows
- * Platform: PC with Windows 3.1
- * Description: IN3 is now available for MS-Windows. Users can call
- applications to the foreground with voice commands. Once the
- application is called, the user may enter commands and data with
- voice commands. Voice macros can reduce the strain of repetitive
- stress injuries (RSI) such as Carpel Tunnel Syndrome (CTS) by
- replacing heavy repetitive keyboard hammering with simple voice
- operations. Voice macros take complex operations and reduce them
- to simple verbal commands. Voice input can provide new facilities
- for tasks which could not easily have been otherwise performed
- without the multiple axis of input. IN3 is hardware-independent,
- users with any Windows-compatible audio add speech recognition to
- the desktop. IN3 works with either 8 bit or 16 bit Windows audio
- boards. IN3 is based on continuous word-spotting technology. A
- developer API is also available for creating voice-enabled
- applications.
- * Price: $179 U.S.
- * Requirements: PC with 80386 processor or better, Microsoft
- Windows 3.1, and Windows compatible audio system with microphone.
- * Misc: Fully functional demos are available on Compuserve in
- various Multimedia and CAD forums. Demos are also available from
- "America on Line", the comp.binaries.ms-windows archive sites, and
- various BBS systems. It is also available by anonymous ftp
- +
- ftp://ftp.wustl.edu/usenet/comp.binaries.ms-windows/v3/in3demo
- .zip
- + ftp://ftp.uwasa.fi/mirror/ultrasound/demo/in3demo.zip
- An equivilant Sun product is described below.
- * Contact:
- Brantley Kelly
- Email: cbk@gacc.atl.ga.us CIS: 75120,431
- FAX: 1-404-925-7924 Phone: 1-404-925-7950
- Command Corp. Inc, 3675 Crestwood Parkway, Duluth GA 30136, USA
-
- IN3 Voice Command
- * Platform: Sun SPARCstation
- * Description: IN3 provides a secure, robust, word spotting,
- continuous speech recognition facility for the Sun OS or Solaris
- operating systems. The recognition system is a secure operating
- system facility capable of working with various interfaces,
- microphones, and devices. The operating system interface works
- with native UNIX outside of X Windows as well as provides enhanced
- X Windows facilities including named window support. The user
- interface provides a means to quickly create commands on the fly
- for replacing long strings and complex operations with voice
- macros. [Voice macros can reduce the strain of repetitive stress
- injuries (RSI) such as Carpel Tunnel Syndrome (CTS) by replacing
- heavy repetitive keyboard hammering with simple voice operations.
- ] The IN3 user interface works with generic X servers and window
- managers. A developer API is also available for creating voice-
- enabled applications, interfacing with other audio sources, and
- providing extensive application control over the recognition
- facility.
- * Availability: SunSite archive at SunSITE.unc.edu as well as on
- Catalyst CDware as both a runable demo and unlockable software.
- * Hardware Required: Sun SPARCstation with audio input. Noise
- canceling microphone recommended but not required.
- * Software Required:
- + Sun OS 4.1.2 with OpenWindows 3.0
- + or, Sun OS 4.1.3
- + or, Solaris 2.1 or Solaris 2.2
- * Misc: An equivilant MS-Windows product is described above.
- * Price: $495 U.S.
- * Contact:
- Brantley Kelly
- Email: cbk@gacc.atl.ga.us CIS: 75120,431
- FAX: 1-404-925-7924 Phone: 1-404-813-8030
- Command Corp. Inc, 3675 Crestwood Parkway, Duluth GA 30136, USA
-
- Phonetic Engine 400 (PE400) - Speech Systems, Inc.
- * Platform: PC
- * Description: Speaker independent, large vocabulary, continuous
- speech recognition for MS Windows or DOS.
- * Rough Cost: $1195 US dollars. Includes board, microphone,
- developer kit, documentation, 2 days of technical training and 90
- days of technical support.
- * Requirements: IBM AT class machine or better plus 5M disk space.
- Most processing is performed on-board (4M standard or 16M
- upgrade).
- * Misc: Requires developer to provide a context-free grammar.
- Vocabulary size unknown (quotes from 500 - 2000 words per
- grammar), but dynamic grammar switching capabilities may increase
- the effective vocabulary size. Development system includes
- lower-level C,C++ library (VoiceLib), higher-level DLL (SPOT)
- callable from many languages, SPOT/VBX, a custom control for
- Visual Basic and Visual C++.
- * Contact:
- Speech Systems, Inc.
- 2945 Center Green Court South
- Boulder, CO 80301-2275, USA
- Tel: 303.938.1110 Fax: 303.938.1874
-
- SayIt
- * Platform: Sun SPARCstation
- * Description: Voice recognition and macro building package for
- Suns in the Openwindows 3.0 environment. Speaker dependent
- discrete speech recognition. Vocabularies can be associated to
- applications and the active vocabulary follows the application
- that has input focus. Macros can include mouse commands,
- keystrokes, Unix commands, sound, Openwindow actions and more. An
- evaluation copy is available by email.
- * Hardware: Microphone required (SunMicrophone is fine).
- * Cost: $US295
- * Contact:
- Phone: 1-800-245-UNIX or 1-415-572-0200
- Fax: 1-415-572-1300
- Email: info@qualix.com
-
- Kurzweil Voice for Windows
- * Platform: MS Windows 3.1
- * Description: Kurzweil Voice for Windows is a dictation product
- enabling the user to create text and enter data by speaking to
- Windows-based applications. System is adaptive but requires no
- initial training. Users can choose either 30,000 or 60,000 word
- active vocabulary. Application command translation templates for
- popular Windows application such as WordPerfect, 1-2-3, Organizer,
- Word.
- * Cost: US $995
- * Hardware: 486DX/33 or higher, 8 or 16 MB dedicated memory
- (depends on vocabulary, 30 MBs dedicated disk space, VGA or
- higher, Kurzweil-supplied microphone and DSP board.
- * Contact:
- Phone: 1-800-380-1234
- Email: info@kurz-ai.com
-
- D6006 Voice Control Processor
- * Platform: ?
- * Description: ?
- * Contact:
- DSP Telecommunications Inc.
- 2855 Kifer Road, Suite 202, Santa Clara CA 95051, USA
- Tel:(408)986-4310
- Fax:(408)986-4324
-
- Speech Commander - Listen for Windows
- * Platform: ?
- * Description: ?
- * Contact:
- Verbex Voice Systems
- 1090 King Georges Post Rd., Bldg 107,
- Edison NJ 08837, USA
- Tel:(908)225-5225
- Fax:(908)225-7764
-
- Voice-Trek 2.0
- * Platform: ?
- * Description: ?
- * Contact:
- Tardis Technology Inc., Voice Recognition Div.
- 10321 Los Alamitos Blvd., Los Alamitos CA 90720
- Tel:(310)799-3355 Fax:(310)799-3360
-
- Visus SpeechKit
- * Platform: NeXT
- * Description: SpeechKit is based on SPHINX, a
- speaker-independent, 1000 word or so, continuous speech
- recognition system which allows you to incorporate speech
- recognition into your applications. You can design your vocabulary
- and grammars.
- * Contact: Visus - no address or phone provided. A possible
- contact is Robert Brennan at Carnegie Mellon University. email:
- Robert_Brennan@cmu.edu
-
- recnet
- * Platform: UNIX
- * Description: Speech recognition for the speaker independent
- TIMIT and Resource Management tasks. It uses recurrent networks to
- estimate phone probabilities and Markov models to find the most
- probable sequence of phones or words. The system is a snapshot of
- evolving research code. There is no documentation other than
- published research papers. The components are:
- + A preprocessor which implements many standard and many non-
- standard front end processing techniques.
- + A recurrent net recogniser and parameter files
- + Two Markov model based recognisers, one for phone recognition
- and one for word recognition
- + A dynamic programming scoring package The complete system
- performs competatively.
- * Cost: Free
- * Requirements: TIMIT and Resource Management databases
- * Contact: Tony Robinson: ajr@eng.cam.ac.uk
- * Availability: by anonymous ftp
- +
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/misc/recnet-1.3.ta
- r.Z
-
- Lotec Speech Recognition Package
- * Platform: Sun
- * Description: Public domain speech recognition software. Operates
- from input in Sun audio format (.au files) and outputs word
- hypotheses and time labelling data. The software includes programs
- to collect speech samples, a labeller, a "featurizer" which
- parameterises speech files, a word spotter and the recogniser. The
- software can perform real time recognition on a Sparc 10 for small
- vocabularies.
- * Requirements: Sun SPARC audio input and a "decent" microphone
- Sun multimedia demo software (in /usr/demo/SOUND) and X.
- * Availability: By anonymous ftp
- + ftp://ftp.sanpo.t.u-tokyo.ac.jp/pub/nigel/lotec/lotec.tar.Z
- * Contact: Nigel Ward: nigel@sanpo.t.u-tokyo.ac.jp
-
- Myers' Hidden Markov Model software
- * Description: Hidden Markov model software for automatic speech
- recognition. C++ code that implements a basic left-right hidden
- Markov model and corresponding Baum-Welch (ML) training algorithm.
- It is meant as an example of the HMM algorithms described by
- L.Rabiner and others. The code was built in order to learn how HMM
- systems work and we are now offering it to the net so that others
- can learn how to use HMMs for speech recognition. Keep in mind
- that ease of understanding was pit primary concern, not
- efficiency. The code can be used to build an experimental speech
- recognition systems using "train_hmm" and "test_hmm", and can be
- used in conjunction with written tutorials on HMMs to understand
- how they work.
- * Availability: By anonymous ftp from the comp.speech archive
- site. There are three files in the directory
- + ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/sources
- The files are
- + hmm.README
- + hmm-1.0.tar.Z
- + OR, hmm-1.0.tar.gz
- (Note: hmm-1.0.tar.Z and hmm-1.0.tar.gz compressed and GNU compressed
- versions of the same files)
- * Contact: Richard Myers: email rmyers@ics.uci.edu
-
- Voice Command Line Interface
- * Platform: Amiga
- * Description: VCLI will execute CLI commands, ARexx commands, or
- ARexx scripts by voice command through your audio digitizer. VCLI
- allows you to launch multiple applications or control any program
- with an ARexx capability entirely by spoken voice command. VCLI is
- fully multitasking and will run in the background, continuously
- listening for your voice commands even while other programs are
- running. Documentation is provided in AmigaGuide format. VCLI 6.0
- runs under either Amiga DOS 2.0 or 3.0.
- * Cost: Free?
- * Requirements: Supports the DSS8, PerfectSound 3, Sound Master,
- Sound Magic, and Generic audio digitizers.
- * Availability: by ftp from wuarchive.wustl.edu in the file
- systems/amiga/incoming/audio/VCLI60.lha and from
- amiga.physik.unizh.ch as the file pub/aminet/util/misc/VCLI60.lha
- * Contact: Author's email is RHorne@cup.portal.com
-
- DATAVOX - French
- * Platform: PC
- * Description: Continuous speech - speaker independent or
- dependent.
- * Rough Cost: ?
- * Requirements: 2 PC format boards (RdF1000 and TdS 96/25) and an
- A/D - D/A module (ASA116)
- * Misc: Application software may dialog with DATAVOX through 2
- types of interfaces :
- + Keyboard overlay: The application software may be used with
- any PC compatible package. No specific adaptation is
- necessary, you only need to define your configuration with
- the application software.
- + C library: Allows a user-written program to drive the
- recognition system.
- DATAVOX is based on the AMADEUS speech recognition software developed
- at LIMSI. It provides
- + Continuous speech recognition with 500 words speaker
- dependent, 50 words speaker independent (custom-made
- vocabulary).
- + Grammar of the application language (syntax acquisition,
- verification and simplification software).
- + Large vocabulary : DATAVOX can recognize vocabularies of
- several thousand words as long as there are no more than 500
- words in the active vocabulary at any given node. It takes
- less than 1 second to change syntax and vocabulary.
- + Training controlled by the system (use of co-articulation
- models).
- + Response time less than 500 ms for any phrase length.
- + Synthetis (ADPCM) can be heard simultaneously while
- recognition is being carried out.
- * Contact:
- VECSYS
- Le Chene rond, 91570 Bievres, France
- Fax: 33 1 69 41 24 30
- Voice: 33 1 69 41 15 04
-
- PowerSecretary
- * Platform: Centris 650, 660AV. Quadra 650, 660AV, 700,800, 840AV,
- 900, 950.
- * Description: Speaker dependent/adaptive system requiring words
- to be separated by short pauses.
- * Vocabulary: 30,000 at any one time, automatically selected from
- 120,000-word dictionary.
- * Cost: US$2,495; non-AV machines need an audio board will cost
- about US$300.
- * Requirements: Minimum of 16M of ram and System 7.0.
- * Contact:
- Articulate Systems
- 600 W. Cummings Park, Suite 4500
- Woburn, MA 01801
- Ph: (617) 935-5656 Fax: (617) 935-0490.
-
- ICSS system from IBM
- * Description: A large vocabulary, speaker independent, continuous
- speech system which runs under Windows, OS/2, and AIX.
- * Requirements: Soundboard (e.g. Soundblaster)
- * Price: ?
- * Contact: ?
-
- Creative VoiceAssist
- * Platform: PC (?)
- * Price: $US99.95
- * Contact:
- Creative Labs
- Ph: 1-800-998-5227
-
- _________________________________________________________________
-
-
-
-
- Andrew Hunt
- ---
- Speech Technology Research Group Ph: 61-2-351 4509
- Dept. of Electrical Engineering Fax: 61-2-351 3847
- University of Sydney, NSW, 2006, Australia email: andrewh@speech.su.oz.au
-
-