home *** CD-ROM | disk | FTP | other *** search
- Path: senator-bedfellow.mit.edu!faqserv
- From: andrew.hunt@east.sun.com (Andrew Hunt)
- Newsgroups: comp.speech,comp.answers,news.answers
- Subject: comp.speech Frequently Asked Questions - part 3/3
- Supersedes: <comp-speech-faq/part3_897652698@rtfm.mit.edu>
- Followup-To: comp.speech
- Date: 12 Jul 1998 12:00:30 GMT
- Organization: Speech Applications Group, Sun Microsystems Laboratories
- Lines: 4577
- Approved: news-answers-request@MIT.Edu
- Expires: 23 Aug 1998 12:00:04 GMT
- Message-ID: <comp-speech-faq/part3_900244804@rtfm.mit.edu>
- References: <comp-speech-faq/part1_900244804@rtfm.mit.edu>
- Reply-To: andrew.hunt@east.sun.com (Andrew Hunt)
- NNTP-Posting-Host: penguin-lust.mit.edu
- Summary: Information on Speech Technology
- X-Last-Updated: 1998/07/08
- Originator: faqserv@penguin-lust.MIT.EDU
- Xref: senator-bedfellow.mit.edu comp.speech:18457 comp.answers:32123 news.answers:134644
-
- Archive-name: comp-speech-faq/part3
- Last-modified: 1998/07/06
- URL: http://www.speech.su.oz.au/comp.speech/
-
- COMP.SPEECH FAQ POSTING - PART 3/3
-
-
- [Note: this document has been automatically extracted from a WWW site:
- http://www.speech.su.oz.au/comp.speech/
- This may introduce some formatting errors.]
-
-
- Speech Synthesis
-
- comp.speech FAQ Section 5
-
- * SpeechLinks: Speech Synthesis
- * Q5.1: What is speech synthesis?
- * Q5.2: How can speech synthesis be performed?
- * Q5.3: References/Books on Synthesis
- * Q5.4: Speech Synthesis on the WWW
- * Q5.5: Speech Synthesis Software/Hardware
-
-
- ___________________________________________________________________________
-
- Q5.1: What is speech synthesis?
-
- Speech synthesis programs convert written input to spoken output by
- automatically generating synthetic speech. Speech synthesis is often
- referred to a "Text-to-Speech" conversion (TTS).
-
-
- ___________________________________________________________________________
-
- Q5.2: Performing speech synthesis
-
- There are several algorithms. The choice depends on the task they're
- used for. The easiest way is to just record the voice of a person
- speaking the desired phrases. This is useful if only a restricted
- volume of phrases and sentences is used, e.g. messages in a train
- station, or schedule information via phone. The quality depends on the
- way recording is done.
-
- More sophisticated but worse in quality are algorithms which split the
- speech into smaller pieces. The smaller those units are, the less are
- they in number, but the quality also decreases. An often used unit is
- the phoneme, the smallest linguistic unit. Depending on the language
- used there are about 35-50 phonemes in western European languages,
- i.e. there are 35-50 single recordings. The problem is combining them
- as fluent speech requires fluent transitions between the elements. The
- intellegibility is therefore lower, but the memory required is small.
-
- A solution to this dilemma is using diphones. Instead of splitting at
- the transitions, the cut is done at the center of the phonemes,
- leaving the transitions themselves intact. This gives about 400
- elements (20*20) and the quality increases.
-
- The longer the units become, the more elements are there, but the
- quality increases along with the memory required. Other units which
- are widely used are half-syllables, syllables, words, or combinations
- of them, e.g. word stems and inflectional endings.
-
- The Museum of Speech Analysis and Synthesis has pictures of artificial
- speech systems going back over 150 years: worth a visit. (
- http://mambo.ucsc.edu/psl/smus/smus.html)
-
-
- ___________________________________________________________________________
-
- Q5.3: References/Books on Synthesis
-
- Books and Papers
-
- * Thierry Dutoit, An Introduction to Text-to-Speech Synthesis,
- Kluwer Academic Publishers (Dordrecht), 1997, ISBN 0-7923-4498-7,
- 312 pages. Volume 3 in the series on Text, Speech and Language
- Technology.
- * Douglas O'Shaughnessy, Speech Communication: Human and Machine
- Addison Wesley series in Electrical Engineering: Digital Signal
- Processing, 1987.
- * T.V. Raman, Auditory User Interfaces --Toward The Speaking
- Computer Kluwer Academic Publishers, Boston, ISBN 0-7923-9984-6,
- August 1997, 168 pp.
- * D. H. Klatt, "Review of Text-To-Speech Conversion for English",
- Jnl. of the Acoustic Society of America (JASA), Vol 82, pp
- 737-793.
- * "Talking Machines, Theories, Models and Designs" Eds, G. Bailly &
- C. Benoit (Elsevier: North Holland)
- * I. H. Witten. Principles of Computer Speech, London: Academic
- Press, Inc., 1982.
- * W.B. Kleijn and K.K. Paliwal (Eds.), Speech Coding and Synthesis,
- Elsevier, Amsterdam, 1995.
- Contents, preface etc on the WWW:
- http://www.elsevier.nl/section/engtech/scs/menu.htm
- * John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to
- Speech: The MITalk System", Cambridge University Press, 1987.
- * J.P.H. van Santen, R. W. Sproat, J. P. Olive, and J. Hirschberg,
- "Progress in Speech Synthesis", Springer, 1996.
-
- On the WWW
-
- * Survey of the State of the Art in Human Language Technology
- Report edited by Ronald A. Cole et. al. with a section on
- Text-to-Speech Technologies.
- http://www.cse.ogi.edu/CSLU/HLTsurvey/ch5node1.html
-
- Bibliographies and Reference Lists
-
- * WWW searchable online-bibiliography for Phonetics and Speech
- Technology with more than 8000 entries. Provided by Institut fur
- Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt.
- http://www.uni-frankfurt.de/~ifb/bib_engl.html
- * Computational Speech Processing: Speech Analysis, Recognition,
- Understanding, Compression, Transmission, Coding, Synthesis ; Text
- to Speech Systems, Speech to Tactile Displays, Speaker
- Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.
- Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA
- inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
- See also: http://gomer.mlink.net/infolingua.html
-
-
- ___________________________________________________________________________
-
- Q5.4: Speech Synthesis on the WWW
-
- Most of the following are links to WWW pages with demonstrations of
- speech synthesis. Plenty more links are included in the detailed list
- of speech synthesis software/hardware in Q5.5.
-
- Speech Synthesis "Museum"
- URL: http://www.cs.bham.ac.uk/~jpi/synth/museum.html
- Maintained by Jon Iles (j.p.iles@cs.bham.ac.uk) at the
- University of Birmingham.
- Information and speech samples for
-
- + YorkTalk
- + Loughborough Sound Images
- + University of Birmingham - FDFS
- + Eurovocs
- + DECtalk
- + AT&T Bell Labs Synthesiser
- + S.W.A.Ll.C. - Welsh Synthesis from CSTR
- + All-Prosodic Speech Synthesis - IPOX
- + Orator from Bellcore
-
- The Festival Speech Synthesis System
- http://www.cstr.ed.ac.uk/projects/festival.html
- Pre-synthesized examples in English, Welsh and Spanish, and
- online demo of English.
-
- Pavarobotti
- http://www.shc.uiowa.edu/fun/pavarobotti/pavarobotti.html
- WWW demo of the Pavarobotti synthesis technology developed at
- the National Center for Voice and Speech
- (http://www.shc.uiowa.edu/ncvs_home.html).
-
- Say...
- http://wwwtios.cs.utwente.nl/say
- WWW demo of the rsynth speech synthesis software. The WWW
- capability was implemented by Axel Belinfante.
-
- Musee sonore de la synthese de la Parole en francais
- http://www.icp.grenet.fr/exemples_synthese/ex.html
- Speech synthesis examples from a series of French language
- speech synthesisers plus links to other speech synthesis demo
- pages.
-
- + ICP-Grenoble
- + CNET-Lannion (with TD-PSOLA)
- + KTH-Stockholm
- + Universite-Mons - several versions
-
- Lucent Technologies Bell Labs Text-to-Speech
- http://www.bell-labs.com/project/tts/
- Demos and samples of the latest Lucent Technologies Bell Labs
- Text-to-Speech system.
-
- WATSON FlexTalk from AT&T Advanced Speech Products Group
- http://www.att.com/aspg/demo.html
- WWW interface to the WATSON FlexTalk speech synthesis
- demonstration.
-
- AT&T Bell Laboratories Voices
- http://www.research.att.com/cgi-bin/cgiwrap/mjm/voices.cgi
- WWW interface to the AT&T Bell Laboratories text to speech
- (TTS) synthesizer
-
- Laureate from British Telecom
- http://www.labs.bt.com/innovate/speech/laureate/
- Demo of the Laureate speech synthesis system - not yet
- commercially available.
-
- ORATOR from Bellcore
- Online demo of the ORATOR system developed at Bellcore.
- http://www.bellcore.com/ORATOR/
-
- SVOX from TIK, ETH in Zurich
- http://www.tik.ee.ethz.ch/cgi-bin/w3svox
- Demo of German speech synthesis from Institut fur Technische
- Informatik und Kommunikationsnetze.
-
- Speech Synthesis Research at OGI
- http://www.cse.ogi.edu/CSLU/research/TTS
- Examples of diphone speech corpora and algorithms developed at
- OGI for synthesis of American English and Mexican Spanish using
- the Festival framework.
-
- Lyricos
- http://www.cse.ogi.edu/CSLU/research/TTS/research/sing.html
- Demos of the Lyricos singing voice synthesis system.
- Concatenation-based synthesis of singing voice from MIDI input.
-
- Multi-Lingual TTS from Gerhard-Mercator University, Duisburg
- http://www.fb9-ti.uni-duisburg.de/demos/speech.html
- Synthesis in German, English or Japanese.
-
- TMH: Institutionen for Taloverforing och Musikakustik, Kungliga
- Tekniska Hogskolan
- http://www.speech.kth.se/info/software.html
- Synthesis in Swedish, Finish, Norwegian, Icelandic, Danish,
- British and American English, French, German, Italian, Spanish,
- LA Spanish and Greek.
-
- Haskins Laboratory WWW Site
- http://www.haskins.yale.edu/Haskins/MISC/special.html
- Examples of several types of speech synthesis. Articulatory
- Synthesis by HyperASY. SineWave Synthesis. Gestural
- Computational Model. Pattern Playback system of the 1940's!
-
- BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
- http://www.bestspeech.com/weblang.html
-
- Eurovocs Multilingual Speech Synthesis
- http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.h
- tml
- Based on Lernout and Hauspie technology.
-
- HADIFIX German Speech Synthesis
- http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html
- Provided by the Instituts fur Kommunikationsforschung und
- Phonetik, Universitat Bonn.
-
- Centigram's TruVoice Demo
- http://www.centigram.com/centigram/TruVoice/index.html
- Allows control of speech rate, pitch and other prosodic
- charateristics.
-
- MBROLA: Free Speech Synthesis Project
- http://tcts.fpms.ac.be/synthesis/modelcmp.html
- WWW demo of MBROLA which compares the quality of PSOLA,
- MBR-PSOLA, LPC, and Hybrid Harmonic/Stochastic concatenative
- synthesizers. Provided by the TCTS Lab, Faculti Polytechnique
- de Mons, Belgium
-
- Institute of Phonetic Sciences
- http://fonsg3.let.uva.nl/IFA-Features.html
- Links to lots of on-line speech synthesis demonstrations
- provided by the Institute of Phonetic Sciences of the Faculty
- of Arts of the University of Amsterdam.
-
- Yahoo page on speech generation
- http://www.yahoo.com/Science/Computer_Science/Artificial_Intell
- igence/Natural_Language_Processing/Speech_Generation/
-
-
- ___________________________________________________________________________
-
- Q5.5: Speech Synthesis Software/Hardware
-
- Please email any updates, corrections or additions to the following
- list. The range of commercially available synthesis software is
- growing rapidly so any help in keeping up to date will be appreciated.
-
- Other lists of speech synthesis software on the WWW include:
-
- Kevin Lenzo's list of Macintosh Speech Resources and Apps
- http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
-
- Speech Toys Speech Synthesis Information
- http://www.speechtoys.com/spchtoys/spsyn.html
-
- In the FAQ...
-
- The following speech recognition software/hardware is described in the
- comp.speech FAQ.
-
- _Apple Macintosh_
- * BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
- * Infovox Product Range
- * Macintosh Speech Output Applications
- * Macintosh Speech Synthesis Manager
- * MacYack Pro
- * MBROLA: Free Speech Synthesis Project
- * ProVoice Developer's Speech Toolkit from First Byte
- * SENSYN speech synthesizer
- * Sound Bytes DeveloperUs Kit
- * Macintosh Speech Synthesis Manager
-
- _Windows (including 95, NT, 3.1)_
- * AcuVoice
- * AT&T Watson Speech Synthesis
- * BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
- * Creative TextAssist and TextAssist API
- * DECtalk: Text-to-Speech from Digital
- * ETI-Eloquence
- * HADIFIX
- * Infovox Product Range
- * IPOX: All Prosodic Speech Synthesis Architecture
- * Lernout and Hauspie Text-To-Speech Windows SDK
- * Listen2 Text Reader
- * MBROLA: Free Speech Synthesis Project
- * Monologue for Windows from First Byte
- * PAM - A Text-To-Speech Application
- * ProVerbe Speech Engine from ELAN Informatique
- * ProVoice Developer's Speech Toolkit from First Byte
- * SENSYN speech synthesizer
- * Sound Bytes DeveloperUs Kit
- * Tinytalk
- * TruVoice from Centigram
- * WinSpeech
- * ZMD Speech Synthesis
-
- _DOS_
- * CSRE: Computerized Speech Research Environment
- * Infovox Product Range
- * MBROLA: Free Speech Synthesis Project
- * ProVoice Developer's Speech Toolkit from First Byte
- * SENSYN speech synthesizer
- * spchsyn.exe
- * Tinytalk
- * ZMD Speech Synthesis
-
- _OS/2_
- * ProVerbe Speech Engine from ELAN Informatique
- * ProVoice Developer's Speech Toolkit from First Byte
- * Sound Bytes DeveloperUs Kit
-
- _Unix_
- * AcuVoice
- * AsTeR
- * BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
- * DECtalk: Text-to-Speech from Digital
- * ETI-Eloquence
- * Emacspeak - A Speech Output Subsystem For Emacs
- * Festival Speech Synthesis System
- * JSRU
- * Klatt-style synthesiser
- * KPE80 - A Klatt Synthesiser and Parameter Editor
- * "learph": Trainable text-to-phoneme software by Antonio Lucca
-
- * Lucent Technologies Bell Labs Text-to-Speech system
- * MBROLA: Free Speech Synthesis Project
- * Orator from Bellcore
- * ProVerbe Speech Engine from ELAN Informatique
- * rsynth
- * SENSYN speech synthesizer
- * SGI Developers Toolbox Synthesiser
- * Speak
- * TrueTalk
- * TruVoice from Centigram
-
- _Integrated Circuits and Dedicated Hardware_
- * Eurovocs
- * Infovox Product Range
- * ProVerbe Speech Engine from ELAN Informatique
- * RC Systems V8600/V8601 Text to Speech synthesizers
-
- _Other Platforms_
- * BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
- * TheBigMouth (NeXT)
- * MBROLA: Free Speech Synthesis Project
- * Narrator Translator Library (Amiga)
- * Narrator (Amiga)
- * TextToSpeech Kit (NeXT)
- * Orator from Bellcore
- * SENSYN speech synthesizer
- * WreadFiles: File reader for Commodore Amiga
-
- _Unknown_
- * Lernout and Hauspie Text-To-Speech (3 products)
- * SIMTEL
- * Text to Phoneme Program 1
- * Text to phoneme program 2
- * Text to phoneme program 3
-
-
-
- AcuVoice
-
- * Platform: Windows, Solaris
- * Description: AcuVoice is a natural sounding text-to-speech system
- built using a concatenative approach. Currently it is available
- for an American English Male Voice. Software Developer Kits are
- available for the Windows Platform (32-Bit) and also for the
- Solaris Platform. More information and samples are available on
- the Acuvoice web site.
- * Contact: AcuVoice, Inc.
- 84 W. Santa Clara Street, Suite 720, San Jose, CA 95113-1810
- Ph: 1(408)289-1661, Fax: 1(408)289-1201
- Demo: 1(408)289-1177
- Email: AcuVoice1@AOL.COM
- WWW: http://www.acuvoice.com/
-
-
-
- AsTeR
-
- * Platform: UNIX
- * Description: TTS front-end program which encodes structural
- information about documents in speech synthesis. For more
- information check out:
-
- http://www.research.digital.com/CRL/personal/raman/aster/
- aster-toplevel.html
-
- * Operation requirements: Lisp: Lucid, clisp
- * Contact: T. V. Raman
- WWW: http://www.research.digital.com/CRL/personal/raman/raman.html
-
- Email: raman@adobe.com
-
-
-
- AT&T Watson Speech Synthesis
-
- * Platform: Windows 95/NT on a Pentium 75 Mhz or higher
- * Description: Watson is a software implementation of AT&T Bell
- Laboratories voice processing technology. Watson includes BLASR
- Speech Recognition (see Q6.6) and FlexTalk speech synthesis. It
- requires no special hardware to run other than a standard sound
- card and/or phone card. Technical details for the FlexTalk speech
- synthesis include:
- + Compliant with MS Speech API.
- + Male and Female Voices available
- + 8 KHz and 11 KHz output
- + SoundBlaster compatible sound card and drivers required
- + Context sensitive abbreviation expansion
- + Accurate pronunciation of most proper names
- + Adjustable vocal tract size, speed, volume, pitch, etc.
- + American English only - other languages in development
- The AT&T Advanced Speech Products Group home page provides more
- detailed information including a Frequently Asked Questions list,
- information for application developers on the Independent Software
- Vendor (ISV) Program (including info on the SDK, licensing, and
- the training program).
- * Requirements: Uses 2 MB RAM, 10 MB Disk. Requires a Pentium 75 MHz
- or higher (uses
- * Cost and Availability: WATSON is a software-based speech platform
- with a Software Developers Kit (SDK) that allows application
- developers to use voice processing in their applications. It is
- not available as a stand-alone product.
- Licensing information (inc. price) is provided in the AT&T
- Advanced Speech Products Group home page
- * See also: Watson BLASR speech recognition in Q6.5, Microsoft
- Speech API, and Advanced Speech API.
- * Contact: AT&T Advanced Speech Products Group
- Suite 700, 44 East Mifflin Street, Madison, WI 53703, USA
- Ph: 1-800-5-WATSON, Fax: 1-608-259-2269
- Email: aspg@attmail.com
- WWW: http://www.att.com/aspg/
-
-
-
- BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
-
- * Platform: available for Macintosh, Sun, Silicon Graphics, Windows
- PC and IBM RS/6000 platforms, and can be ported to others.
- * Description: BeSTspeech reads ASCII text no vocabulary limits.
- Available for Dutch, English (male and female), French, German,
- Italian, Portuguese, Spanish, Arabic, Cantonese, Japanese, Korean,
- Malay, Mandarin and Russian.
- * Availability: Berkeley Speech Technologies, Inc does not sell end
- user toolkits or products.
- * Contact: Berkeley Speech Technologies, Inc.
- 2246 Sixth Street, Berkeley, California 94710, USA
- Ph: (510) 841-5083, Fax: (510) 841-5093
- Email: webmaster@bst.com
- WWW: http://www.bestspeech.com/index.html
-
-
-
- TheBigMouth - a Text to Speech Program
-
- * Platform: NeXT
- * Description: Text to speech program based on concatenation of
- pre-recorded speech segments.
- * Availability:
- ftp://ftp.cs.keio.ac.jp/pub/NeXT/source/TheBigMouth1.0.tar.Z
-
-
-
- Creative TextAssist
-
- * Platform: Windows
- * Description: Based on DECtalk speech synthesis. A detailed
- description of TextAssist is provided on the Creative WWW pages.
- TextAssist TextReader provides a convenient Windows user interface
- for text reading.
- * Availability: Creative TextAssist is bundled with most (all?)
- Creative Sound Blaster audio cards. TextAssist preview software is
- available from the Creative Labs TextAssist home page.
- * Contact: Creative Labs, Inc.
- Address, phone, email etc unknown
- WWW: http://www.creaf.com/ :
- http://www.creaf.com/wwwnew/tech/devcnr/tassist.html
-
- Creative TextAssist API
-
- * Platform: Windows
- * Description: The TextAssist API (TAAPI) is created for Microsoft
- Windows 3.1x and Windows 95 developers who intend to develop
- 16-bit Text-to-Speech software applications using Creative's
- TextAssist speech engine. It supports direct control of speech
- output characteristics, concurrent playback of text-to-speech and
- wave files, foreign language support, speech synchronization,
- exception dictionaries. It also includes a voice editing tool for
- creating new custom voices, a Visual Basic Custom Control for
- high-level support in Visual Basic and other languages
- * Availability: The TextAssist API is released to registered
- developers at no cost.
- * Contact: WWW: http://www.creaf.com/
- FAQ: http://www.creaf.com/wwwnew/tech/devcnr/tassfaq.html
-
-
-
- CSRE: Computerized Speech Research Environment
-
- * Platform: DOS
- * Description: CSRE is a software system which includes in an
- implementation of the Klatt speech synthesizer. See the CSRE entry
- in Q1.9 and the AVAAZ WWW pages for more detail.
- * Contact: AVAAZ Innovations Inc.
- P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G
- 2B0
- Ph: +1-519-472-7944 , Fax: +1-519-472-7814
- Email: info@avaaz.com
- WWW: http://www.icis.on.ca/homepages/avaaz/
-
-
-
- DECtalk Speech Synthesis
-
- * Platform: Windows NT, Alpha with Digital UNIX and RS232 ports
- * Description: Converts ordinary text into natural-sounding,
- intelligible speech. Provides personalized voices, and extensive
- user controls. DECtalk technology is available for the following
- packaging options.
- + DECtalk PC card option: An industry-standard ISA/EISA bus
- card implementation that can be integrated with any Intel 486
- processor-based system running DOS or Windows. Applications
- can be interfaced to the bus via a DOS Terminate and Stay
- Resident (TSR) driver or a Windows Dynamic Link Library
- (DLL). This option is available with an external speaker with
- volume control and headphone jack.
- + DECtalk Express external package: An external, portable
- package that you can plug in to any PC or serial port. The
- external package includes a built-in speaker and headphone
- jack, plus combined on/off and volume controls and a
- rechargeable battery pack.
- + DECtalk Software solution: Software-only text to speech for
- Alpha or Intel systems running Windows NT or Alpha systems
- running Digital UNIX. Provides complete speech synthesis
- capabilities so developers can enhance applications with
- DECtalk technology. DECtalk Software output can be directed
- to audio devices, into WAVE files, or into memory buffers.
- * Pricing:
- ://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synthesis
- -oi.html
- * More Information:
- Digital Equipment Corporation WWW pages: http://www.digital.com/
- DECtalk page:
- http://www.systems.digital.com/DIcatalog/html/DECtalk-Software.htm
- l
- Ph: 1-800-DIGITAL
-
- DECtalk Software
-
- * Platform: Digital UNIX and Windows NT
- * Description: DECtalk converts standard ASCII text into natural,
- intelligible speech. Speech output through any audio device is
- supported by Microsoft Video for Windows or Multimedia Services
- for Digital UNIX. An API gives developers direct access to
- text-to-speech functions. Provides nine voice personalities (4
- female, 4 male, 1 child). Provides punctuation and tonal control,
- supports customized pronunciation of trade jargon and acronyms.
- Common programming interface works with both Alpha and Intel
- platforms.
- * More Information:
- Digital Equipment Corporation WWW pages: http://www.digital.com/
- DECtalk Software page:
- http://www.systems.digital.com/DIcatalog/html/DECtalk-Software.htm
- l
- WWW:
- http://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synth
- esis.html
- Ph: 1-800-DIGITAL
-
-
-
- ETI-Eloquence
-
- * Platform: MS Windows (Win95,NT,3.1), Solaris, SunOS, SGI, RS/6000
- * Description: ETI-Eloquence is a software based text-to-speech
- system. It generates waveforms completely algorithmically instead
- of by concatenating waveforms, for maximum flexibility and
- naturalism. For instance, when the user requests a deeper voice,
- the software simulates a larger vocal tract, instead of simply
- pitch-shifting samples. It uses high-level linguistic parsing,
- which obviates the need for a huge dictionary. It handles numbers,
- acronyms, currency, etc. It includes a set of annotation symbols,
- for placing stress on particular words, expressing
- excitement/boredom, etc. Also allows phonetic input. Supports MS
- SAPI.
- Produces male and female voices for General American English.
- Dialects under development include Alabama and Brooklyn.
- * Price: Flexible license agreements on application.
- * Availability:Eloquent Technology, Inc.
- 2389 North Triphammer Road, Ithaca, NY 14850 , USA
- Ph: (607) 266-7025, Fax: (607) 266-7030
- Email: info@eloq.com
- WWW: http://www.eloq.com/
-
-
-
- Emacspeak - A Speech Output Subsystem For Emacs
-
- * Platform: UNIX, Emacs
- * Description: Emacspeak is a speech output system that will allow
- someone who cannot see to work directly on a UNIX system.
- Emacspeak is built on top of Emacs. With emacspeak loaded, Emacs
- provides spoken feedback for everything you do. Emacspeak
- currently supports the new Dectalk Express speech synthesizer, as
- well as older versions of the Dectalk e.g. the MultiVoice. See the
- Emacspeak WWW page, the Emacspeak FAQ or the Emacspeak
- distribution for additional details.
- * Requirements: Requires GNU FSF Emacs 19 (version 19.23 or later)
- and TCLX 7.3B (Extended TCL) to run Emacspeak.
- * Availability:
-
- Emacspeak WWW page
- http://www.research.digital.com/CRL/personal/raman/emacsp
- eak/emacspeak.html
-
- Emacspeak source
- http://www.research.digital.com/CRL/personal/raman/emacsp
- eak/emacspeak.tar.gz
-
- * Contact: T. V. Raman, raman@adobe.com
-
-
-
- Eurovocs
-
- * Platform: Various - RS232 hardware connection
- * Description: Eurovocs is a stand-alone text-to-speech synthesizer
- which uses the text-to-speech technology of Lernout and Hauspie
- Speech Products. Available for Dutch, French, German and American
- English with other languages planned for release soon. One
- Eurovocs device can support two different languages. Eurovocs can
- be connected to any computer via a standard serial interface
- (RS232). It supports personal dictionaries, generation of DTMF
- tones, and pronunciation of special character sequences such as
- digit strings, telephone-numbers, date and time indications,
- abbreviations, alphanumeric strings etc.
- * Contact: Technologie & Revalidatie
- Postbus 128, B-9000 Gent, Belgium
- Ph: +32-9-264 33 97, Fax: +32-9-264 35 94
- E-mail: noe@elis.rug.ac.be
- WWW:
- http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.html
-
-
-
- Festival Speech Synthesis System
-
- * Platform: General Unix (including Solaris (2.4,2.5), SunOS, HPUX,
- SGIs, Linux, Dec Alpha, FreeBSD)
- * Description: Festival is a general multi-lingual speech synthesis
- system developed at CSTR, University of Edinburgh. It offers a
- full text to speech system with various APIs, as well an
- environment for development and research of speech synthesis
- techniques. It is written in C++ with a Scheme-based command
- interpreter for general control. Festival's home page offers
- demos, the full manual and access to the download page. The
- distribution includes full source and documentation, and lexicons
- and speech databases for British English text to speech.
- * Price: Free for non-commercial use
- * Availability: by anonymous ftp:
- WWW: http://www.cstr.ed.ac.uk/projects/festival/download.html
- ftp: ftp://ftp.cstr.ed.ac.uk/pub/festival/
-
-
-
- HADIFIX
-
- * Platform: Windows
- * Description: German speech synthesis system developed at the
- Institute for Communications Research and Phonetics , University
- of Bonn. Provides conversion of input text to phonemes, automatic
- prediction of stress, phrasing and pitch, and speech generation by
- concatenation of small units of natural speech. Demisyllables and
- similar units are used; they comprise all consonants before the
- vowel and the beginning of the vowel (initial demisyllable) or the
- end of the vowel and the following consonants (final
- demisyllable). For example, the word 'Strolch' is formed by
- concatenating 'Stro' and 'olch'.
- * Demo: Windows demo software available. Limited to synthesis of one
- short text (text.txt) at a time. Speech format limitations too.
- 1.3MB file.
- ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadidemo.zip
- A 1993 version is available with unlimited synthesis from a string
- of phonemic symbols and accent markers. 6MB file.
- ftp://asl1.ikp.uni-bonn.de/pub/hadifix/hadi25.lzh
- * WWW: http://asl1.ikp.uni-bonn.de/~tpo/Hadifix.en.html
- * On-line demo: http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html
-
-
-
- Infovox Product Range
-
- * Description: Multilingual Text-to-speech systems, languages
- available: American English, British English, German, French,
- Spanish, Italian, Swedish, Norwegian, Icelandic, Danish and
- Finnish.
- * Product name:INFOVOX 500, PC BOARD
- + Product description: Half length expansion board for IBM PC,
- XT, AT, PS/2 model 30 or compatible personal computers. The
- board can also be connected via the serial port. Language and
- control program for downloading into RAM or mounted on EPROMs
- + Platform: DOS/Windows with IBM PC, XT, AT, PS/2 model 30 or
- compatible
- + Delivered standard interface: MS DOS I/O driver
- * Product name: INFOVOX 600, OEM BOARD
- + Product description: OEM board built with CMOS IC's. Language
- and control program are stored in on-board fixed memory.
- + Platform: any, hardware interface: 9-pole D-SUB (RS 232-C)
- 300-9600 Baud.
- + Delivered standard interfaces: MS DOS I/O driver and
- interface to Apple Speech manager.
- * Product name: INFOVOX 700, DESKTOP UNIT
- + Product description: Desktop unit with built in Infovox 600
- to be connected to any computer or terminal via an RS 232-C
- serial interface. Built in loudspeaker and rechargable
- battery for 4 hours use, and control knobs for continuous
- control of speech volume and speed.
- + Platform: various through hardware interface
- + Delivered standard interfaces: MS DOS I/O driver and
- interface to Apple Speech manager
- * Product name: INFOVOX 650, OEM BOARD
- + Product description: OEM-board built with CMOS IC's. Language
- and control program are stored in on-board memory.
- + Platform: any, hardware interface: 9 pole D-SUB (RS 232-C)
- 300-9600 Baud
- + Delivered standard interfaces: MS DOS I/O driver and
- interface to Apple Speech manager
- * Product name: INFOVOX 750, DESKTOP UNIT
- + Product description: Desktop unit with built in Infovox 650
- to be connected to any computer or terminal via an RS 232-C
- serial interface. Built in loudspeaker and rechargable
- battery for 5 hours use, and a control knob for continuous
- control of speech volume.
- + Platform: various through hardware interface. Delivered
- standard interfaces include MS DOS I/O driver and interface
- to Apple Speech manager
- * Product name: Infovox 210, software for Apple Macintosh
- + Product description: Software based text-to-speech
- conversion. Produces 16 bit and 8 bit sound. Delivered on
- 3.5" diskettes with user lexicon and a complete
- documentation.
- + Platform: Apple Macintosh with minimum 68030, 33 MHz
- microprocessor.
- + Delivered standard interfaces: Standard interface to Apple
- Speech manager
- * Product name: Infovox 220, software for Microsoft Windows.
- + Product description: Software based text-to-speech
- conversion. Produces 16 bit sound and conforms to Microsoft
- Windows multimedia standard MCI. Delivered on 3.5" diskettes
- with user lexicon and a complete documentation.
- + Platform: Windows on IBM compatible PC with minimum 486/25MHz
- microprocessor.
- + Delivered standard interfaces: Standard interface to
- Microsoft Windows 3.1 and sound boards supporting Microsoft
- Windows multimedia driver for audio.
- * Contact: Telia Promotor Infovox AB
- TTS Sales Division
- P.O. Box 2069, S-171 02 Solna, Sweden
- Ph: +46 8 764 35 00, Fax: +46 8 735 78 76
- Email: tts-sales@infovox.se
- WWW: http://www.promotor.telia.se/NYA/cc/t-s/index.html
-
-
-
- IPOX: All Prosodic Speech Synthesis Architecture
-
- * Platform: Windows
- * Description: IPOX is an experimental, all-prosodic speech
- synthesizer, developed by Arthur Dirksen and John Coleman. IPOX is
- freely available (after registration) for evaluation and
- non-profit research purposes.
- * Requirements: PC (preferably a fast 486) running Windows 3.1 or
- higher. Sound output requires a 16-bit Windows-compatible sound
- card
- * Availability: By WWW from
- http://www.tue.nl/ipo/people/adirksen/ipox/ipox.htm
-
-
-
- JSRU
-
- * Platform: UNIX and PC
- * Cost: 100 pounds sterling (from academic institutions and
- industry)
- * Description: A C version of the JSRU system, Version 2.3 is
- available. It's written in Turbo C but runs on most Unix systems
- with very little modification. A Form of Agreement must be signed
- to say that the software is required for research and development
- only.
- * Contact: Dr. E.Lewis _eric.lewis@bristol.ac.uk)_
-
-
-
- Klatt-style synthesiser
-
- * Platform: Unix
- * Cost: Free
- * Description: Software posted to comp.speech in late 1992.
- * Availability: By ftp from the comp.speech ftp site
- +
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.
- 04.tar.gz
- +
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.
- 04.tar.Z
- * See also: KPE80 - A Klatt Synthesiser and Parameter Editor.
-
-
-
- KPE80 - A Klatt Synthesiser and Parameter Editor
-
- * Platform: Unix
- * Description: The KPE80 program provides a graphical interface for
- the implementation of the Klatt 1980 formant synthesiser written
- by Jon Iles and Nick Ing-Simmons. It was inspired by IGE, a piece
- of code written by Rob Fletcher (
- http://www.york.ac.uk/~rpf1/IGE.html).
- * Technical Desc.: It is comprised of an X-Window interface and
- version 3.03 of the synthesiser code. The interface allows users
- to display and edit Klatt parameters using a graphical display
- which includes the time-amplitude waveform of both the original
- speech and its synthetic copy, and some signal analysis
- facilities. Most of the work in choosing the parameter values to
- produce the synthetic copy has to be done by the user. KPE will
- estimate the fundamental frequency contour from an original token;
- this estimate will need to be amended where errors occur. It is
- possible to specify the formant trajectories with some precision
- by overlaying the appropriate formant frequency parameter tracks
- on the spectrogram of the target waveform. A number of facilities
- exist to help in the refinement of parameter values: original and
- synthetic waveforms can be compared aurally, spectrally, and
- spectrographically using built-in speech analysis facilities.
- * File formats: KPE will read RIFF (.wav) files and SFS files. (SFS
- is a suite of speech-signal processing programs available free
- from Phonetics and Linguistics, UCL.)
- * Availability:
-
- KPE for SunOs 4.1.3 (statically compiled libraries)
- ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.sun413.tar.Z
-
- KPE for Linux (statically compiled libraries)
- ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.linux.tar.Z
-
- The source code (needs gcc and SUIT to compile)
- ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.src.tar.Z
-
- A postscript overview of KPE
- ftp://pitch.phon.ucl.ac.uk/pub/kpe/OVERVIEW.ps
-
- The SFS distribution
- ftp://pitch.phon.ucl.ac.uk/pub/sfs/
-
- * See also: Public domain Klatt-style speech synthesis code.
- * Contact: Andrew Simpson
- Department of Phonetics and Linguistics, University College London
-
- Wolfson House, 4 Stephenson Way, London NW1 2HE
- Email: a.simpson@ucl.ac.uk
- WWW: http://www.phon.ucl.ac.uk/home/andrew/home.html
-
-
-
- "learph": Trainable text-to-phoneme software by Antonio Lucca
-
- * Platform: UNIX
- * Description: Experimental software which learns text to phoneme
- translation from examples using decision-tree-like data
- structures. It is based on the assumption that each letter can
- correspond to different phoneme strings depending on the context.
- * Availability: Examples and source are available on the WWW:
- http://www.silab.dsi.unimi.it/~al367212/ttsdoc.html
- * Contact: Antonio Lucca: toninlcc@tesi.dsi.unimi.it
-
-
-
- Lernout & Hauspie Text-to-Speech (3 products)
-
- Lernout & Hauspie have three TTS products. The functionality of the
- products is similar, however, they differ in hardware implementation
- and other details where described below.
-
- * L&H tts2000/T: TTS for the Telephony and Telecommunications Market
- * L&H tts2000/M: TTS for the Computer and Multimedia Market
- * L&H tts3000/C: TTS for the Buisness and Consumer Electronics
- Market
-
- * Description: Text to Speech (TTS) software based on parameterized
- segment concatenation (diphones, triphones and tetraphones)
- algorithms. Available for US English, German, Dutch, French,
- Spanish (Castilian), Italian and Korean. General features include:
- + The control of volume, speech rate and speech pitch.
- + The use of control sequences to customize TTS output (adding
- pauses, using phonetic input, etc.).
- + Switching between languages at run time.
- + A personal vocabulary editor is available for building
- exception dictionaries.
- + Readout modes: letter by letter, word by word or sentence by
- sentence.
- + Input formats: orthographic input, phonetic input, phonetic
- input with prosodic information.
- * tts2000/T
- + Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit
- linear PCM.
- + Sampling Frequency: 8kHz
- + Single channel platform examples: SHARP SH7000, ARM6/ARM7,
- Intel i960, TI TMS320C31, AT&T DSP3210
- + Multi-channel platform examples: TI TMS320C31, AT&T DSP3210
- * tts2000/M
- + Output formats: 8/16 bit wave format, 8 bit mu-law PCM, 8 bit
- A-law PCM, 16 bit linear PC.
- + Sampling Frequency: 8/10/11.025 kHz
- + Single processor platform examples: ARM6/ARM7, Intel
- 386/486/Pentium, Motorola 68040
- + Two processor platform examples: {Intel 386/486/Pentium or
- Motorola 68030} and {ADI ADSP21XX or Motorola 5600X or TI
- TMS320C25/20C5X}
- * tts3000/C
- + Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit
- linear PCM.
- + Sampling Frequency: 10kHz
- + Single processor platform examples: SHARP SH7000, ARM6/ARM7,
- Intel i960, TI TMS320C31, AT&T DSP3210
- + Two processors platform examples: { SHARP SH7000 or ARM6/ARM7
- or Intel 386EX or Motorola 683XX} and {ADI ADSP21XX or
- Motorola 5600X or TI TMS320C25/C5X or TI TSP50C10}
- * See also: L&H Windows TTS SDK
- * More Information: on the Lernout & Hauspie WWW pages:
- http://www.lhs.com/tts.html
- * Price: Unknown
- * Contact: Lernout and Hauspie Speech Products
- 20 Mall Road, 4th Floor
- Burlington, MA 01803, USA
- Ph: +1-617-238-0960, Fax: +1-617-238-0986
- Email: sales@lhs.com
- WWW: http://www.lhs.com/
-
-
-
- Lernout & Hauspie Text-to-Speech Windows SDK
-
- * Platform: Windows
- * Description: The L&H Text-to-Speech software developers kit is
- able to integrate text-to-speech technology with your own or
- existing PC applications under Microsoft Windows 3.1. This
- software will allow conversion of written text into clear human
- sounding synthetic speech.
- * Requirements: IBM-compatible PC 386 DX/33 + 8Mb RAM + MS DOS 5.0 +
- MS Windows 3.1 (or higher) + SoundBlaster compatible sound board.
- * See also: L&H TTS Products
- * More Information: on the Lernout & Hauspie WWW pages:
- http://www.lhs.com/tts.html
- * Price: Unknown
- * Contact: Lernout and Hauspie Speech Products
- 20 Mall Road, 4th Floor
- Burlington, MA 01803, USA
- Ph: +1-617-238-0960, Fax: +1-617-238-0986
- Email: sales@lhs.com
- WWW: http://www.lhs.com/
-
-
-
- Listen2 Text Reader
-
- * Platform: Windows
- * Description: Listen2 is a multi-voice, multi-language text reader.
- Listen2 comes in two versions, English only that uses high quality
- male and female voices, and the International version that can
- speak up to 5 different languages: English, German, French,
- Spanish or Italian, all in male voices. The basic International
- program comes with built-in English and additional language fonts
- can be purchased separately. The English version comes complete.
- Both programs are dynamically switchable and configurable. This
- means that you can press a hot key to speed up the speech, make it
- louder or quieter, etc., as it is reading a file. You can also
- insert flags in text files to make it switch voices or switch
- languages, depending on what version you have.
- Listen2 has all the features of the JTS Reader shareware program
- plus a few more. It will voice your reminder messages or
- appointment list on start-up. It will also speak a reminder
- message on shutting down.
- * WWW: A more complete description is available on the Listen2 web
- page
- * Contact: Tom Slemko: e-mail: tslemko@islandnet.com, or,
- JTS Micro Consulting Ltd
- 10931 Lytton Road, RR#4, Ladysmith, B.C., Canada, V0R 2E0
- WWW: http://www.islandnet.com/jts/
-
-
-
- Lucent Technologies Bell Labs Text-to-Speech system
-
- * Platform: UNIX and Win-95/NT
- * Description:Lucent Technologies provides a web site with demos and
- samples of their latest speech synthesis technology. The site has
- interactive demos in American English, German, and Mandarin
- Chinese, and the capability to adjust voice parameters on the fly.
- Pre-synthesized demos for French, Italian, Russian, and Romanian
- are also provided.
- The site includes downloadable papers with detailed system
- descriptions.
- * WWW: http://www.bell-labs.com/project/tts/
-
-
-
- Macintosh Speech Output Applications
-
- * Platform: Macintosh
- * Description: A comprehensive list of Macintosh Speech Applications
- is provided by Kevin Lenzo at CMU:
- http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
- The Apple Speech WWW Site also has some useful information:
- http://www.speech.apple.com/
-
-
-
- Speech Manager and PlainTalk
-
- * Platform: Macintosh
- * Description: Apple's text-to-speech system extensions that enable
- applications to perform text-to-speech conversion. The Speech
- Manager runs on most Macs, but PlainTalk (and the high quality
- voices) requires a 68020 Mac or better.
- * Availability: By anonymous ftp from:
- ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/Syst
- em/PlainTalk 1.4.1/
- This directory contains subdirectories for recent versions of
- PlainTalk. The current release (PlainTalk 1.4.1) contains the
- English Text-To-Speech with about a dozen voices
- (English_Text-to-Speech.hqx: 5.3 MByte), Mexican Spanish
- (Mexican_Spanish_TTS.hqx: 2.8 MByte), and the English Speech
- Recognition software (English_Speech_Recognition.hqx: 2.3MByte).
- * Cost: Free
- * WWW: The latest information is available from Apple's WWW page for
- speech recognition and synthesis:
- http://www.speech.apple.com/
- * Note 1: Check out Kevin Lenzo's list of Macintosh Speech
- Applications.
- * Note 2: Joshua Baer (josh@skyweyr.com) runs a mailing list for
- Plaintalk. For subscription and other information visit the
- Plaintalk Discussion List Home page
- * Contact: Apple Computer, Inc.
- 1 Infinite Loop, Cupertino, CA 95014, USA
- WWW: http://www.speech.apple.com/
- Email: PlainTalk@atg.apple.com
-
-
-
- MacYack Pro
-
- * Platform: Macintosh
- * Description: MacYack Pro is a commercial speech package for
- Macintosh that uses the PlainTalk Text-to-Speech synthesis
- software. Features include:
- + Add speech to any word processor.
- + Hear notification dialogs and other dialog boxes.
- + See and hear a customized message at startup or shutdown.
- + Hear calculations instantly.
- + Correct pronounciation errors.
- + Create custom double-clickable "speech files."
- + Have speaking alert sounds.
- + Add speech to HyperCard stacks.
- + Use AppleScript to add speech to other programs.
- * Price: $29.95 for a limited time, reduced from $49.95 regular
- price. 30 days money back guarantee.
- * Contact: Scantron Quality Computers
- 20200 Nine Mile Rd. St. Clair Shores, MI 48080
- Ph: 1-800-777-3642, Fax: 810-774-2698
- E-mail: sales@sqc.com
- WWW: http://www.sqc.com/
- Product Info: http://www.lowtek.com/macyack/
-
-
-
- MBROLA: Free Speech Synthesis Project
-
- * Platform: Sun4, Sun/SunOS5.4, HP, VAX/VMS, DEC Alpha/VMS, PS/DOS,
- PS/Windows 3.1, PS/Windows 95, PC/Solaris2.4, PC/Linux, SGI
- INDY/IRIX, NeXT, and soon for Macintosh.
- * Description: MBROLA is a high-quality, diphone-based speech
- synthesizer which is available for free. It is provided by the
- TCTS Lab of the Faculte Polytechnique de Mons (Belgium) which aims
- to obtain a set a speech synthesizers for as many languages as
- possible which will be free of use for non-commercial,
- non-military applications.
- MBROLA 2.00 takes a list of phonemes as input, together with
- prosodic information (duration of phonemes and a piecewise linear
- description of pitch), and produces 16bit speech samples at the
- sampling frequency of the diphone database (typically 16kHz). (It
- is therefore NOT a Text-To-Speech (TTS) synthesizer, since it does
- not accept raw text as input.) Databases are now being prepared
- for English, Spanish, Italian, Dutch, and Romanian. Collaborations
- are welcome. More information can be found at the MBROLA project
- homepage.
- * Demonstration: WWW demo of MBROLA which compares the quality of
- PSOLA, MBR-PSOLA, LPC, and Hybrid Harmonic/Stochastic
- concatenative synthesizers is available at
- http://tcts.fpms.ac.be/synthesis/modelcmp.html.
- * Contact: Dr Thierry Dutoit
- Faculte Polytechnique de Mons, TCTS Lab,
- 31, bvd Dolez, B-7000 Mons, Belgium.
- Ph: +32-65-374133, Fax: +32-65-374129
- e-mail: mbrola@tcts.fpms.ac.be
- WWW: http://tcts.fpms.ac.be/synthesis/mbrola.html
-
-
-
- Monologue for Windows from First Byte
-
- * Platform: Windows
- * Description: Monologue is a software program that reads text from
- the clipboard in Windows 16 or 32 bit applications. It can be
- found as a bundled product with many sound cards and multimedia
- general purpose computer systems. Monologue can add the element of
- speech to virtually any text oriented application. Any
- pronounceable combination of letters and numbers will be spoken
- clearly. It can be applied to tasks such as eyes-free
- proofreading, data verification (e.g. spreadsheets), reading
- E-mail and more. User-changeable parameters provide control over
- the sound quality by allowing for changes in pitch, and the speed
- of speech. An exception dictionary saves preferred pronunciation
- of words and abbreviations.
- Monologue Win32 now includes support for the Microsoft SAPI.
- Monologue male "SpeechFonts" are available for US English, British
- English, German, French, Latin American Spanish, Italian. A US
- English Female SpeechFont is also available.
- For more detailed information and examples go to the First Byte
- WWW pages.
- * Availability: Currently bundled with many sound cards and
- multimedia general purpose computer systems. For pricing,
- licensing details, and release information see the First Byte WWW
- pages or email info@firstbyte.davd.com.
- * See also: ProVoice Developer's Speech Toolkit from First Byte
- * Contact: First Byte
- 19840 Pioneer Ave., Torrance, CA 90503
- Ph: 310-793-0610 Fax: 310-793-0611
- Email: info@firstbyte.davd.com
- WWW: http://www.firstbyte.davd.com/
-
-
-
- Narrator Translator Library
-
- * Platform: Amiga
- * Description: A US English text to phoneme translator, implemented
- as a resident software library, for use with the Amiga Narrator
- Device. This software was supplied as a standard part of the Amiga
- operating system software up to O.S version 2.04. (Translator
- version 37.1, 1991) Approximately 700 translation rules are used
- to create the 'ARPAbet' phonemes. This software is functional on
- all current Amiga systems (O.S. 3.1).
- * Availability: limited to pre-owned system software disks and
- unsold O.S upgrade kits (Pre-O.S. 2.1).
-
- Replacement Library: Translator42
-
- * Platform: Amiga
- * Description: an independent replacement for the Commodore-supplied
- "translator.library" which is a part of the Narrator speech
- synthesis package. It implements multi-lingual text-to-speech for
- an Amiga. The translation rules for each language are defined in a
- plain text 'Accent' file.
- There is a provision for the selection of unique languages for
- text segments by inserting in-line markup codes in the text: e.g.
- "Hello there! \french{Bonjour} \deutsch{gute morgen}".
- 'Accent' files for American English, British English, Swedish,
- Maori, Finnish, German, Icelandic, Klingon, Polish, Italian, and
- Welsh languages included in the archive.
- * Availability: Amiga The most current version, 42.4, of the library
- and source are available by anonymous ftp from Aminet:
- ftp://ftp.doc.ic.ac.uk/pub/aminet/util/libs/translator42.lha
- ftp://ftp.doc.ic.ac.uk/pub/aminet/dev/src/tran42src.lha
-
-
-
- Narrator
-
- * Platform: Amiga
- * Description: Formant based speech synthesis. Includes a
- Engish-to-phoneme translation library, and a SPEAK: pseudo-device
- for speech output.
- * Hardware: Standard Amiga hardware
- * Availability: Part of AmigaOS
- * See Also: The Narrator Translation library
-
-
-
- TextToSpeech Kit
-
- * Platform: NeXT Computers
- * Description: The TextToSpeech Kit does unrestricted conversion of
- English text to synthesized speech in real-time. The user has
- control over speaking rate, median pitch, stereo balance, volume,
- and intonation type. Text of any length can be spoken, and
- messages can be queued up, from multiple applications if desired.
- Real-time controls such as pause, continue, and erase are
- included. Pronunciations are derived primarily by dictionary
- look-up. The Main Dictionary has nearly 100,000 hand-edited
- pronunciations which can be supplemented or overridden with the
- User and Application dictionaries. A number parser handles numbers
- in any form. A letter-to-sound knowledge base provides
- pronunciations for words not in the Main or customized
- dictionaries. Dictionary search order is under user control.
- Special modes of text input are available for spelling and
- emphasis of words or phrases. The actual conversion of text to
- speech is done by the TextToSpeech Server. The Server runs as an
- independent task in the background, and can handle up to 50 client
- connections.
- * Misc: The TextToSpeech Kit comes in two packages: the Developer
- Kit and the User Kit. The Developer Kit enables developers to
- build and test applications which incorporate text-to-speech. It
- includes the TextToSpeech Server, the TextToSpeech Object, the
- pronunciation editor PrEditor, several example applications,
- phonetic fonts, example source code, and developer documentation.
- The User Kit provides support for applications which incorporate
- text-to-speech. It is a subset of the Developer Kit.
- * Hardware: Uses standard NeXT Computer hardware.
- * Cost:
- + TextToSpeech User Kit: $175 CDN ($145 US)
- + TextToSpeech Developer Kit: $350 CDN ($290 US)
- + Upgrade from User to Developer Kit: $175 CDN ($145 US)
- * Availability: Trillium Sound Research
-
- 1500, 112 - 4th Ave. S.W., Calgary, Alberta, Canada, T2P 0H3
- Tel: (403) 284-9278 Fax: (403) 282-6778
- Order Desk: 1-800-L-ORATOR (US and Canada only)
- Email: TTSInfo@trillium.ab.ca
-
-
-
- Orator Text-to-Speech Synthesizer
-
- * Platform: SUN SPARC, Decstation 5000. Written in C, and therefore
- portable to other UNIX platforms. Some successful ports: HP,
- RS-6000, PC-Unix [Linux].
- * Description: Sophisticated speech synthesis package. Has text
- preprocessing (for abbreviations, numbers), acronym rules, and
- human-like spelling routines. Natural-sounding synthesis based on
- demisyllable concatenation. Has high accuracy for pronunciation of
- names of people, places and businesses in America; good accuracy
- for English text; rules for stress and intonation marking; various
- methods of user control and customization at most stages of
- processing.
- A new version of the ORATOR system is under development. Both
- ORATOR and this new "ORATOR II" system are capable of general text
- synthesis. The ORATOR II system has a more natural-sounding voice.
- * Hardware: Runs on common SPARC or Decstation workstations, using
- their internal audio output capability. Recommend at least 16M of
- memory.
- * WWW: More detailed information plus examples of ORATOR synthesis
- are available on the ORATOR WWW pages:
- http://www.bellcore.com/ORATOR/
- * Misc 1: A free demo cassette is available.
- * Misc 2: Examples of Orator are also available on the University of
- Birmingham Speech Synthesis "Museum" WWW site (see Q5.4).
- * Availability and Pricing: Contact Bellcore's Licensing Office
- Tel: 1-800-521-CORE (521-2673)
- Fax: 1-908-336-2559
- Email: Anthony Lindsey: alin1@panix.com
- WWW: http://www.bellcore.com/ORATOR/
-
-
-
- PAM - A Text-To-Speech Application
-
- * Platform: Windows
- * Description: PAM is a talking personal assistant and text reader
- application. It uses the ProVoice TTS package. PAM will verbally
- advise about appointments and reminder messages at specified times
- during the day. It can read text files, clipboard text, and text
- sent in DDE messages. Using the full verbal interface, PAM can be
- used by visually challenged individuals. Shareware - thirty day
- free trial.
- * Requirements: Any Windows sound card, speakers or headphones. Min.
- memory - 4 megs, 8 megs recommended.
- * WWW: A more complete description is available on the JTS homepage:
- http://www.islandnet.com/~tslemko/
- * Availability: The shareware can be downloaded by ftp from
- ftp://ftp.islandnet.com/jts/pam_en3c.zip. The file size is approx.
- 1 MByte.
- * Price: $US40 for the registered version.
- * Contact: Tom Slemko: e-mail: tslemko@islandnet.com, or,
- JTS Micro Consulting Ltd
- 10931 Lytton Road, RR#4, Ladysmith, B.C., Canada, V0R 2E0
-
-
-
- ProVerbe Speech Engine from ELAN Informatique
-
- * Platform: Windows 3.x, NT, 95, OS/2, Unix Solaris, Unix SCO and
- hardware
- * Description: The ProVerbe Speech Engine from ELAN Informatique
- produces natural sounding speech from written text. Naturalness is
- achieved by using the TD-PSOLA process from the CNET (France
- telecom's research lab.) which is based on the concatenation of
- elementary speech units (including diphones). Supported languages
- are British English, American English, Russian, German, French and
- Spanish. For multi-channel applications Elan Informatique also
- provides hardware platforms.
- Elan Informatique provides a SDK reference document (sdken.doc:
- WinWord6 format).
- * Demo versions: Telephone demonstration: +33-561 17 67 01
- Sample sound files and demonstration software available.
- A CD-ROM with all these demonstrations is available by
- registration.
- * Contact: Elan Informatique
- 4 rue Jean Rodier, 31400 TOULOUSE FRANCE
- Contact person: Pierre Delrat
- Phone: +33-561-36-0777 Fax: +33-61-36-0770
- BBS: +33-561-36-0788
- E-mail: sales@elan.fr
- ftp: ftp://ftp.elan.fr
- WWW: http://www.elan.fr/
-
-
-
- ProVoice Developer's Speech Toolkit from First Byte
-
- * Platform: ProVoice Developer's Toolkits are available for DOS,
- Windows 3.1, Windows 95, Windows NT, OS/2, and Macintosh.
- * Description: ProVoice allows programmers to add synthesized speech
- to their applications. Your program passes text strings to the
- ProVoice speech engine that translates text into audible speech.
- Male and/or female "SpeechFonts" are available for many languages;
- English, French, German, UK British English, Italian, and Spanish.
-
- ProVoice converts text to speech in two phases using a set of
- phonetic translation and pronunciation rules. First, the software
- analyzes and translates text into "sound descriptors", a phonetic
- language with pitch, duration, and amplitude codes which are
- needed to produce stress patterns in phrases and sentences. Rules
- are used to analyze words, numbers, and punctuation. The second
- phase converts the intermediate phonetic language in speech
- signals; algorithms drive distinct speech signals into smooth
- flowing, continuous, clear speech. Real time synchronization of
- mouth movement and word boundaries allows animation of a graphical
- talking character, or highlighting of displayed text as it is
- spoken.
- Necessary tools and examples are provided for programmers to
- manipulate the ProVoice speech technology; including installation
- instructions, extensive samples programs, and complete
- documentation. In addition, sample code is provided on disk to
- illustrate speech programming techniques.
- * Note 1: First Byte will perform custom work for embedded systems.
- * Note 2: ProVoice Windows includes support for the Microsoft SAPI.
- It will speak through any Windows-supported wave audio device.
- * Note 3: Distribution of ProVoice for commercial use is subject to
- execution of a Commercial Product Distribution License Agreement.
- * WWW: For more detailed information and examples go to the First
- Byte WWW page: http://www.firstbyte.davd.com/
- * See also: Monologue for Windows from First Byte
- * Price and Availability: Contact First Byte
- * Contact: First Byte
- 19840 Pioneer Ave., Torrance, CA 90503
- Ph: 310-793-0610, Fax: 310-793-0611
- Email: info@firstbyte.davd.com
- WWW: http://www.firstbyte.davd.com/
-
-
-
- RC Systems V8600/V8601 Text to Speech synthesizers
-
- * Platform 1: IBM PC: ISA card.
- * Platform 2: Interface to PC/104 standard microcontrollers.
- * Platform 3: Standalone (or embedded) hardware thru RS232 or
- parallel printer port or processor bus.
- * Description: Converts plain ASCII text to speech. Programmable
- voices, pitch rate, volume, etc. Built-in DTMF and tone
- generators.
- * Price: $151-$299 US (qty 1)
- * Contact: RC Systems
-
- 1609 England Avenue, Everett, WA 98203, USA
- Ph: (206) 355-3800 Fax: (206) 355-1098
- Europe: +44181 539-0285
-
-
-
- rsynth
-
- * Platform: Various (including Solaris2.3, SunOS4.1.3, HPUX, SGI
- Irix4.x, Linux)
- * Description: Public domain text-to-speech systm assembled from a
- variety of sources. It supports CMU and BEEP format dictionaries
- (as described in Q1.10) and now utilises stress marks in the
- dictionary in synthesising intonation.
- * Price: Free
- * Misc: Axel Belinfante has implemented a WWW rsynth demo:
- http://wwwtios.cs.utwente.nl/say.
- * Availability: by anonymous ftp from
-
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsy
- nth-2.0.tar.Z
-
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsy
- nth-2.0.tar.gz
-
-
-
- SENSYN speech synthesizer
-
- * Platform: PC/DOS/Windows, Macintosh, Sun, and NeXT
- * Rough Cost: $300
- * Description: This formant synthesizer produces speech waveform
- files based on the (Klatt) KLSYN88 synthesizer. It is intended for
- laboratory and research use. Note that this is NOT a
- text-to-speech synthesizer, but creates speech sounds based upon a
- large number of input variables (formant frequencies, bandwidths,
- glottal pulse characteristics, etc.) and would be used as part of
- a TTS system. Includes full source code.
- * Availability: Sensimetrics Corporation
- Sidney Street, Cambridge MA 02139.
- Fax: (617) 225-0470; Tel: (617) 225-2442.
- Email: sensimetrics@sens.com
- WWW: http://www.sens.com/
-
-
-
- SGI Developers Toolbox Synthesiser
-
- * Platform: SGI
- * Description: The SGI Developer Toolbox 4.0 CDROM contains a
- basicpublic domain text-to-speech program in the publics/speak
- directory. The directory includes man pages and source.
- * Availability: on the SGI Developer Toolbox 4.0 CDROM
-
-
-
- SIMTEL
-
- A wide range of speech related software, sound-blaster software and
- signal processing software for PCs is available on SimTel and its
- mirror sites. It can be obtained by ftp from:
-
- ftp://ftp.coast.net/SimTel/msdos/voice/
-
- and is now on the WWW:
-
- http://www.acs.oakland.edu/oak/SimTel/win3/sound.html
-
- Voicemaker
-
- The archives include the program Voicemaker which synthesises speech
- from phonemes using "concatenation" of phonemes recorded by the user.
- Voicemaker is a freeware program. It requires an IBM or compatible,
- 512KB RAM, sound blaster compatible sound card.
-
- ftp://ftp.coast.net/SimTel/msdos/voice/vm110.zip
-
-
-
- Sound Bytes DeveloperUs Kit
-
- * Platform: Subroutine library for Windows, OS/2 and Macintosh
- * Hardware: Windows - 16 MHz 80386 (minimum) running Windows 3.1; 4
- Mb RAM with at least 1.4 Mb RAM free. Disk space 1.4 Mb.
- OS/2 - 16 MHz 80386 (minimum) running OS/2 2.0 or above; 8 Mb RAM
- with at least 1.4 Mb RAM free.
- Mac - Any Mac with at least 2.5 Mb of RAM running 6.0.4 or higher.
- Telephone compatible. Compatible with commonly used sound cards.
- * Description: SBDK is a software-only sentence-level synthesizer
- that converts unrestricted English text (ASCII) into synthesized
- voice through diphone concatenation. SBDK utlizes parsing to
- incorporate the intonational and rhythmic patterns of normal
- speech. The developerUs kit includes two voices, one female and
- one male. The product has a 55,000-word built-in dictionary and a
- tool for creating customized user dictionaries. It converts
- numbers, dates, dollars, phone numbers and times to words, and has
- a SoundOut facility that provides a choice of pronouncing unknown
- words phonetically or spelling them out. Developers can vary voice
- pitch (130-220 Hz) and rate (65-200 wpm), synchronize speech to
- other events, have multiple channels of speech to the same or
- different boards, etc. Speech sampling options: 8-bit linear;
- 8-bit companded at 11 kHz (Windows); 8-bit mu-law PCM at 8 or 11
- kHz; 16-bit linear at 11 kHz.
- * Cost: Sound Bytes may be licensed for internal use or resale. Site
- license fee= $3750. Resale or Internal runtime fees= 2% of net
- sales price per runtime sold, OR $150 per telephone port, OR per
- unit pricing for internal use determined case-by-case.
- * Misc: Demo disks are available for Windows and the Mac.
- * Availability: Natural Speech Technologies, Inc.
- Ph: (619) 457-2526.
-
-
-
- spchsyn.exe
-
- * Platform: DOS
- * Availability: By anonymous ftp as a self extracting DOS archive.
- ftp://evans.ee.adfa.oz.au/mirrors/tibbs/applications/spchsyn.exe
- * Requirements: May require special TI product(s), but all source is
- there.
-
-
-
- "Speak" - a Text to Speech Program
-
- * Platform: Sun SPARC
- * Description: Text to speech program based on concatenation of
- pre-recorded speech segments. A function library can be used to
- integrate speech output into other code.
- * Hardware: SPARC audio I/O
- * Availability: by anonymous ftp
- ftp://wilma.cs.brown.edu/pub/speak.tar.Z
-
-
-
- Speech Manager and PlainTalk
-
- * Platform: Macintosh
- * Description: Apple's text-to-speech system extensions that enable
- applications to perform text-to-speech conversion. The Speech
- Manager runs on most Macs, but PlainTalk (and the high quality
- voices) requires a 68020 Mac or better.
- * Availability: By anonymous ftp from:
- ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/Syst
- em/PlainTalk 1.4.1/
- This directory contains subdirectories for recent versions of
- PlainTalk. The current release (PlainTalk 1.4.1) contains the
- English Text-To-Speech with about a dozen voices
- (English_Text-to-Speech.hqx: 5.3 MByte), Mexican Spanish
- (Mexican_Spanish_TTS.hqx: 2.8 MByte), and the English Speech
- Recognition software (English_Speech_Recognition.hqx: 2.3MByte).
- * Cost: Free
- * WWW: The latest information is available from Apple's WWW page for
- speech recognition and synthesis:
- http://www.speech.apple.com/
- * Note 1: Check out Kevin Lenzo's list of Macintosh Speech
- Applications.
- * Note 2: Joshua Baer (josh@skyweyr.com) runs a mailing list for
- Plaintalk. For subscription and other information visit the
- Plaintalk Discussion List Home page
- * Contact: Apple Computer, Inc.
- 1 Infinite Loop, Cupertino, CA 95014, USA
- WWW: http://www.speech.apple.com/
- Email: PlainTalk@atg.apple.com
-
-
-
- Text to phoneme program (1)
-
- * Platform: unknown
- * Description: Text to phoneme program. Based on Naval Research
- Lab's set of text to phoneme rules.
- * Availability: by anonymous ftp
- ftp://shark.cse.fau.edu/pub/src/phon.tar.Z
-
-
-
- Text to phoneme program (2)
-
- * Platform: unknown
- * Description: Text to phoneme program.
- * Availability: by anonymous ftp
- ftp://ftp.doc.ic.ac.uk/packages/unix-c/utils/phoneme.c.gz
-
-
-
- Text to phoneme program (3)
-
- * Description: A public domain version of the same Naval Research
- Lab text to phoneme rules.
- * Availability: By anonymous ftp
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/english2phon
- eme.tar.gz
-
-
-
- Tinytalk
-
- * Platform: DOS / Windows???
- * Description: Shareware package is a speech 'screen reader' which
- is used by many blind users.
- * Price: Tinytalk is now $150. There are package deals on Tinytalk
- with various speech synthesizers.
- * Availability: Tinytalk is available by anonymous ftp from the
- following site
-
- Files: ttexe167.zip and ttdoc167.zip (executable and
- documenation)
- ftp://ftp.netcom.com/pub/eb/ebohlman/
-
- (Note: it is a busy ftp server.)
- * Contact: Eric Bohlman
-
- OMS Development
- 610-B Forest Ave., Wilmette, IL 60091
- Ph: (800)831-0272 Fax: 708-251-5793
- Outside North America: (708)-251-5787
- Email: ebohlman@netcom.com
-
-
-
- TrueTalk
-
- * Platform: Sun Sparcstation 1+/2/LX/5/10/20 with SunOS 4.1.3, or
- SGI Indy/Indigo/Indigo2 with IRIX 5.2. More platforms in
- development.
- * Description: Personal TrueTalk, by Entropic Research Laboratory,
- Inc., is an all-software Text-to-Speech (TTS) system designed to
- voice-enable UNIX X-Windows workstations. It combines a graphical
- interface with a powerful TTS engine based on technology developed
- by AT&T Bell Laboratories. Features include:
- + Intelligible, prosodically natural speech.
- + Text taken from file input, highlighted X selections, the
- interface scratch pad, other programs connected through a
- TCP/IP socket, or Tcl/Tk applications via the Tk "send"
- mechanism.
- + Stop, pause and resume while speech is in progress.
- + Visual indication of corresponding text position when paused.
- + Nine speaking voices, with Male and Female versions of each
- voice.
- + Adjustable speaking rate and volume.
- + Supports drop-in text filters; "email" and "lively" examples
- included.
- + Audio output through workstation headphones or speaker.
- + Complete on-line documentation, including mouse-activated
- help windows.
- * Misc: A more detailed description of TrueTalk is available on the
- Entropic WWW server: http://www.entropic.com/truetalk.com
- * Availability: You can obtain Personal TrueTalk through the
- Internet. For details, see
-
- ftp://ftp.entropic.com/pub/truetalk/README.ptt
-
- Personal TrueTalk is available free of charge for evaluation
- purposes. You can fully-enable your evaluation copy at any time by
- purchasing a license key from Entropic.
- * Requirements: 12MB disk space, 8MB process size (24MB system RAM
- recommended).
- * Cost: US$495; US$395 academic
- * Contact: Entropic Research Laboratory, Inc.,
- Washington, D.C.
- Voice: 1-800-ENTROPIC (North America), (202) 547 1420
- Fax: (202) 547-6648
- Email: truetalk@entropic.com
- WWW: http://www.entropic.com/
-
-
-
- TruVoice from Centigram
-
- * Platform: Windows-NT, Windows 95, Windows 3.1 (limited release),
- Sun Solaris 2.x
- * Description: TruVoice., an advanced text-to-speech converter, is
- available for multiple environments. TruVoice converts text into
- spoken language. TruVoice adds intelligible, natural-sounding
- speech to sound enabled platforms.
- + Small, 1.5MB, memory footprint
- + Advanced text pre-processing
- + No vocabulary restrictions
- + User-definable pronunciation dictionary
- + Accurately pronounces surnames and place names
- + Preprocessor provides e-mail and spreadsheet reading
- capabilities and expands abbreviations.
- + Multiple languages available: American English, Latin
- American Spanish, German, French, Italian
- + Flexible pitch, volume and speech rate
- + Intonation support for punctuation
- + Supports navigational capabilities such as, pause, resume and
- jump forward / jump back with sentence or word boundaries
- More detailed information is provided in the brochure page on the
- Centigram WWW site.
- A demonstration of TruVoice is available on the Centigram WWW
- pages.
- * Cost:
- + Windows versions are $495 for the SDK
- + Solaris versions are $995
- + Contact Centigram for other pricing.
- * Contact: TruVoice Sales
- Centigram Communications Corporation
- 91 East Tasman Drive, San Jose, CA 95134
- Ph: (408) 944-0250 Fax: (408) 428-3732
- Demo: 800-746 1632
- Email: webmaster@centigram.com
- WWW: http://www.centigram.com/
-
-
-
- WinSpeech
-
- * Platform: Windows
- * Description: WinSpeech is a text-to-speech application that reads
- text and produces speech to the audio output. Features basic text
- editing tools, talk from editing window, DDE server allows other
- Windows applications to send text for talking, coach mode for
- providing audio instructions throughout the program, dictionary
- editing tools for customizing pronunciation.
- WSPLIB text-to-speech DLL is a speech functions library for
- developers. More information available by email.
- * Requirements: System requirements: IBM PC or compatible computer
- with Windows 3.1 or higher. Sound card is recommended but not
- required.
- * Availability: Freeware available through the PC WholeWare WWW
- page.
- * Contact: PC WholeWare
- 33 Justin Street, Lexington, MA 02173, U.S.A.
- Email: info@pcww.com
- WWW: http://www.pcww.com/index.html
-
-
-
- WreadFiles: File reader for Commodore Amiga
-
- * Platform: Commodore Amiga
- * Description: WreadFiles is a vocal text file reader program for
- use on the Commodore Amiga. The text is printed to the screen and
- spoken. Features include:
- + Text is read in sentences rather than lines.
- + Dynamic Speech Correction on over 4000 word or word
- fragments.
- + Pronunciations for many place names, personal names, foreign
- names, foreign expressions and abbreviations.
- + Run from Workbench or CLI.
- + Used with A1000 (OS 1.3), A3000 (OS 2.04-2.1), and A4000 (OS
- 3.0)
- * Requirements: Standard Amiga Translator.library and
- Narrator.device required. 2.04 versions recommended. 1 Meg or more
- ram recommended. External speakers required.
- * Availability: No fee requested for non-commercial use. From:
- + GEnie: Page 555,3 File Number 24627
- + Aminet
- ftp://ftp.wustl.edu/pub/aminet/util/misc/WreadFiles47.lha
- * Contact: Written by Michael L. Barlow
- Email: M.Barlow1@GEnie.geis.com or mbarlow@pacific.telebyte.com or
- MikeB@cuix.pscu.com
-
-
-
- ZMD Speech Synthesis
-
- "Speaky" Speech Synthesis from ZMD
-
- * Platform: DSP solution for platform independent speech synthesis
- implementation
- * Description: "Speaky" provides German speech synthesis system in a
- DSP solution. It includes pre-processing of input ASCII text with
- unlimited vocabulary, both parametric and non-parametric speech
- synthesis algorithms, and prosody modelling. More detailed
- information and audio samples can be found at the ZMD WWW Site.
- * Contact: Zentrum Mikroelektronik Dresden GmbH
- Grenzstrasse 28, D-01109 Dresden, Germany
- Ph: +49-351-8822-306, Fax: +49-351-8822-337
- Email: assp@zmd-gmbh.de
- WWW: http://www.zmd-gmbh.de/
-
- ZMD PCMCIA Speech Synthesis Card
-
- * Platform: MS-DOS, Windows
- * Description: Complete text-to-speech synthesis system for the
- German language with unlimited vocabulary using VOICE Processor
- "Speaky". The required pre-processing of the input ASCII text is
- performed by a software programm that is downloaded automatically
- from the PCMCIA Speech Synthesis Card during the card's
- initialising routine. Headphone or active loudspeaker can be
- connected directly for signal output. More detailed information
- and audio samples can be found at the ZMD WWW Site.
- * Requirements: PC Card slot, Card & Socket Services Software
- * Contact: Zentrum Mikroelektronik Dresden GmbH
- Grenzstrasse 28, D-01109 Dresden, Germany
- Ph: +49-351-8822-306, Fax: +49-351-8822-337
- Email: assp@zmd-gmbh.de
- WWW: http://www.zmd-gmbh.de/
-
-
- ___________________________________________________________________________
-
- Speech Recognition
-
- comp.speech FAQ Section 6
-
- * SpeechLinks: Speech Recognition
- * Q6.1: What is speech recognition?
- * Q6.2: How is speech recognition performed?
- * Q6.3: How can I build a simple speech recogniser?
- * Q6.4: References & books on speech recognition
- * Q6.5: Speech Recognition Hardware/Software
- * Q6.6: Speaker Recognition (Verification and Identification)
- * Q6.7: Integrated Speech Products
-
-
- ___________________________________________________________________________
-
- Q6.1: What is speech recognition?
-
- Automatic Speech Recognition
-
- Automatic speech recognition is the process by which a computer maps
- an acoustic speech signal to text.
-
- Automatic speech understanding is the process by which a computer maps
- an acoustic speech signal to some form of abstract meaning of the
- speech.
-
- What does speaker dependent / adaptive / independent mean?
-
- A speaker dependent system is developed to operate for a single
- speaker. These systems are usually easier to develop, cheaper to buy
- and more accurate, but not as flexible as speaker adaptive or speaker
- independent systems.
-
- A speaker independent system is developed to operate for any speaker
- of a particular type (e.g. American English). These systems are the
- most difficult to develop, most expensive and accuracy is lower than
- speaker dependent systems. However, they are more flexible.
-
- A speaker adaptive system is developed to adapt its operation to the
- characteristics of new speakers. It's difficulty lies somewhere
- between speaker independent and speaker dependent systems.
-
- What does small/medium/large/very-large vocabulary mean?
-
- The size of vocabulary of a speech recognition system affects the
- complexity, processing requirements and the accuracy of the system.
- Some applications only require a few words (e.g. numbers only), others
- require very large dictionaries (e.g. dictation machines). There are
- no established definitions, however, try
-
- * small vocabulary - tens of words
- * medium vocabulary - hundreds of words
- * large vocabulary - thousands of words
- * very-large vocabulary - tens of thousands of words.
-
- What does continuous speech or isolated-word mean?
-
- An isolated-word system operates on single words at a time - requiring
- a pause between saying each word. This is the simplest form of
- recognition to perform because the end points are easier to find and
- the pronunciation of a word tends not affect others. Thus, because the
- occurrences of words are more consistent they are easier to recognise.
-
- A continuous speech system operates on speech in which words are
- connected together, i.e. not separated by pauses. Continuous speech is
- more difficult to handle because of a variety of effects. First, it is
- difficult to find the start and end points of words. Another problem
- is "coarticulation". The production of each phoneme is affected by the
- production of surrounding phonemes, and similarly the the start and
- end of words are affected by the preceding and following words. The
- recognition of continuous speech is also affected by the rate of
- speech (fast speech tends to be harder).
-
-
- ___________________________________________________________________________
-
- Q6.2: How is speech recognition performed?
-
- A wide variety of techniques are used to perform speech recognition.
- There are many types of speech recognition. There are many levels of
- speech recognition / analysis / understanding.
-
- Typically speech recognition starts with the digital sampling of
- speech. The next stage is acoustic signal processing. Most techniques
- include spectral analysis; e.g. LPC analysis (Linear Predictive
- Coding), MFCC (Mel Frequency Cepstral Coefficients), cochlea modelling
- and many more.
-
- The next stage is recognition of phonemes, groups of phonemes and
- words. This stage can be achieved by many processes such as DTW
- (Dynamic Time Warping), HMM (hidden Markov modelling), NNs (Neural
- Networks), expert systems and combinations of techniques. HMM-based
- systems are currently the most commonly used and most successful
- approach.
-
- Most systems utilise some knowledge of the language to aid the
- recognition process.
-
- Some systems try to "understand" speech. That is, they try to convert
- the words into a representation of what the speaker intended to mean
- or achieve by what they said.
-
-
- ___________________________________________________________________________
-
- Q6.3: How can I build a simple speech recogniser?
-
- QUICKY RECOGNIZER sketch:
-
- Doug Danforth provides a detailed account in article 253 in the
- comp.speech archives. A summary is provided below. It is also
- available by anonymous ftp
-
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechReco
- gnition
-
- This is a simple recognizer that should give you 85%+ recognition
- accuracy. The accuracy is a function of the words you have in your
- vocabulary. Long distinct words are easy. Short similar words are
- hard. You can get 98+% on the digits with this recognizer.
-
- Overview:
-
- * Find the begining and end of the utterance.
- * Filter the raw signal into frequency bands.
- * Cut the utterance into a fixed number of segments.
- * Average data for each band in each segment.
- * Store this pattern with its name.
- * Collect training set of about 3 repetitions of each pattern
- (word).
- * Recognize unknown by comparing its pattern against all patterns in
- the training set and returning the name of the pattern closest to
- the unknown.
-
- Many variations upon the theme can be made to improve the performance.
- Try different filtering of the raw signal and different processing
- methods.
-
- Public Domain Recognition Software
-
- Q6.5 contains information on public domain speech recognition software
- including: Lotec and Myers' Hidden Markov Model software.
-
- Discrete Hidden Markov Model Demonstration Software
-
- Hidden Markov Models (HMMs) are widely used in speech recognition
- systems. Joe Picone has put together some demonstration software for
- basic discrete HMMs including Viterbi and Baum-Welch training and
- evaluation, random sequence generation (generating data from a model),
- and model updating (useful for incremental training). There is a
- simple demo program that supports all of these modes from command line
- arguments. This allows experiments to test the classic coin-toss
- examples commonly described in textbooks. The code closely parallels
- the following textbook:
-
- * J.R. Deller, Jr., J.G. Proakis, and J.H.L. Hansen, Discrete-Time
- Processing of Speech Signals, MacMillan, 1993, ISBN:
- 0-02-328301-7.
-
- The code is written in C++ and is intended to facilitate learning and
- understanding of the algorithms. The code is available on the ISIP web
- site:
- http://www.isip.msstate.edu/software/
-
- Lecture notes corresponding to the examples are also available:
- http://www.isip.msstate.edu/publications/1996/speech_recognition_short
- _course
-
-
- ___________________________________________________________________________
-
- Q6.4: References & books on speech recognition
-
- * Product Reviews and Comparisons
- * Using Speech Recognition: Health Issues
- * On the WWW
- * Technology: General and Introductory
- * Technical
- * Course Notes
- * Bibliographies and Reference Lists
-
- Product Reviews and Comparisons
-
- * "Talk Show", Wayne Rash Jr., PC Magazine (USA), Dec 20, 1994.
- * "Seybold Report on Desktop Publishing" published a nine-page,
- head-to-head comparison of Dragon's DOS software with IBM's OS/2
- software. March 7, 1994; Volume 8, Number 7; Pages 3-11;
- ISSN:0889-9762; Seybold Publications, P.O. Box 644, Media, PA
- 19063 USA, phone (610) 565-2480.
- * McGraw-Hill Inc.'s "BYTE, the Magazine of Technology Integration,"
- published a two-page review of IBM's Personal Dictation System
- software. May 1994; Volume ?, Number ?; Pages 145-146;
- ISSN:0360-5280; Editorial, Executive, and Circulation address: One
- Phoenix Mill Lane, Peterborough, NH 03458 USA, phone ?
-
- Using Speech Recognition: Health Issues
-
- * The National Center for Voice and Speech provides some basic
- information on preserving "Vocal Health" on their WWW site:
- http://www.shc.uiowa.edu/hygiene/home.html
- * Voice Users Mailing List: detail in Q1.4.html of the FAQ.
- * Typing Injury FAQ: http://www.cs.princeton.edu:80/~dwallach/tifaq/
- has a range of information on Typing Injuries, avoiding them,
- alternatives and more.
- * Typing Injuries Page:
- http://alumni.caltech.edu/~dank/typing-archive.html has links to
- dozens of useful resources.
- * Voice Problems -- Prevention and Correction: advice on preserving
- your voice with specific hints for using speech recognition.
- ftp://ftp.csua.berkeley.edu/pub/typing-injury/voice-problems
- * " Talking to a PC May Be Hazard To Your Throat", by Julie Chao in
- the Wall Street Journal.
- * " Talking to Computers Has its Hazards", by Gordon Arnaut in The
- Globe and Mail
-
- On the WWW
-
- * Survey of the State of the Art in Human Language Technology:
- Report edited by Ronald A. Cole et. al. with a section on Spoken
- Input Technologies.
- http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node2.html
-
- Technology: General and Introductory
-
- Some general introduction books on speech recognition technology:
-
- * Fundamentals of Speech Recognition; Lawrence Rabiner & Biing-Hwang
- Juang Englewood Cliffs NJ: PTR Prentice Hall (Signal Processing
- Series), c1993, ISBN 0-13-015157-2
- * Speech recognition by machine; W.A. Ainsworth London: Peregrinus
- for the Institution of Electrical Engineers, c1988
- * Speech synthesis and recognition; J.N. Holmes Wokingham: Van
- Nostrand Reinhold, c1988
- * Speech Communication: Human and Machine, Douglas O'Shaughnessy;
- Addison Wesley series in Electrical Engineering: Digital Signal
- Processing, 1987.
- * Electronic speech recognition: techniques, technology and
- applications, edited by Geoff Bristow, London: Collins, 1986
- * Readings in Speech Recognition; edited by Alex Waibel & Kai-Fu
- Lee. San Mateo: Morgan Kaufmann, c1990
-
- Technical
-
- * Hidden Markov models for speech recognition; X.D. Huang, Y. Ariki,
- M.A. Jack. Edinburgh: Edinburgh University Press, c1990
- * Speech Recognition: The Complete Practical Reference Guide; T.
- Schalk, P. J. Foster: Telecom Library Inc, New York; ISBN
- O-9366648-39-2; 377 pages; paperback only. Covers speech
- recognition in a telephony environment and wish to use call
- processing hardware based in PCs. It is written using Dialogic
- hardware as the example for the hardware.
- * Automatic speech recognition: the development of the SPHINX
- system; by Kai-Fu Lee; Boston; London: Kluwer Academic, c1989
- * An Introduction to the Application of the Theory of Probabilistic
- Functions of a Markov Process to Automatic Speech Recognition, S.
- E. Levinson, L. R. Rabiner and M. M. Sondhi; in Bell Syst. Tech.
- Jnl. v62(4), pp1035--1074, April 1983
- * Review of Neural Networks for Speech Recognition, R. P. Lippmann;
- in Neural Computation, v1(1), pp 1-38, 1989.
- * Automatic Speech and Speaker Recognition: Advanced Topics, C.H.
- Lee, F.K. Soong and K.K. Paliwal (Eds.), Kluwer, Boston, 1996.
-
- Course Notes
-
- * Joseph Picone of the Institute for Signal and Information
- Processing (ISIP) at Mississippi State University has put the
- course notes for "Fundamentals of Speech Recognition" on the WWW.
- The course covers background probability and phonetics/acoustics,
- speech signal analysis, dynamic programming, dynamic time warping,
- hidden Markov modelling, language modelling, neural networks, etc.
- The WWW sites provides the syllabus and lecture notes.
- WWW: http://www.isip.msstate.edu/publications/1996/ee_8993/
-
- Bibliographies and Reference Lists
-
- * WWW searchable online-bibiliography for Phonetics and Speech
- Technology with more than 8000 entries. Provided by Institut fur
- Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt.
- http://www.uni-frankfurt.de/~ifb/bib_engl.html
- * Computational Speech Processing: Speech Analysis, Recognition,
- Understanding, Compression, Transmission, Coding, Synthesis ; Text
- to Speech Systems, Speech to Tactile Displays, Speaker
- Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.
- Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA
- inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
- See also: http://gomer.mlink.net/infolingua.html
-
-
- ___________________________________________________________________________
-
- Q6.5: Speech Recognition Hardware and Software
-
- The number of speech recognition packages, and the information about
- the software is changing rapidly. Any help with keeping this
- information up to date will be appreciated.
-
- * Products in the FAQ
- * Speech Recognition Processors (ICs)
- * Recognition Information on the WWW
- * Speech Recognition Resellers and Value-Add
-
- In the FAQ:
-
- The following speech recognition software/hardware is described in the
- comp.speech FAQ.
-
- _Apple Macintosh_
- * Digital Dreams Speech Recognition Plug-Ins
- * Dragon Dictation Products
- * Macintosh Speech Recognition Manager
- * PowerSecretary
-
- _Windows (including 95, NT, 3.1)_
- * AT&T Watson Speech Recognition
- * Cambridge Voice for Windows
- * CustomVoice and CustomTelephone: A&G Graphics Interface Inc.
- * DragonDictate for Windows
- * Dragon Dictation Products
- * Dragon Developer Tools
- * Ficomp Interpreter 6000
- * IBM VoiceType Dictation and Control
- * IN CUBE
- * Kurzweil Speech Recognition (2 products)
- * Lernout & Hauspie ASR SDK
- * Listen for Windows 2.0 from Verbex Voice Systems
- * Microsoft Speech Recognition
- * NCC Dictate
- * Phonetic Engine 500 (PE500) from Speech Systems, Inc.
- * Philips Speech Recognition (2 products)
- * ProNotes Voice Tools
- * PureSpeech
- * smARTspeak from Advanced Recognition Technologies, Inc.
- * Visual Voice from Stylus Innovation
- * VoiceAssist for Windows from Creative Labs, Inc.
- * VoiceServer for Windows
- * Whisper
- * WildCard Speech Products
-
- _DOS_
- * DATAVOX - French
- * Dragon Developer Tools
- * Ficomp Interpreter 6000
- * Jialong He's Speech Recognition Research Tool
- * smARTspeak from Advanced Recognition Technologies, Inc.
- * Votan VPC2100 Voice Card and VSP 1010 Speech Processor
-
- _OS/2_
- * IBM VoiceType Dictation and Control
-
- _Unix_
- * AbbotDemo
- * BBN Hark Telephony Recognizer
- * EARS: Single Word Recognition Package
- * Ficomp Interpreter 6000
- * Hidden Markov Model Toolkit (HTK) from Entropic
- * IN CUBE
- * Jialong He's Speech Recognition Research Tool
- * Lotec Speech Recognition Package
- * Myers' Hidden Markov Model software
- * NICO Artificial Neural Network Toolkit
- * Nuance Speech Recognition System
- * PureSpeech
- * recnet
-
- _Integrated Circuits and Dedicated Hardware_
- * HM2007 - Speech Recognition Chip
- * OKI VRP6679 - Speech Recognition Chip
- * Sensory Inc. Integrated Circuits
- * Speech Commander - Verbex Voice Systems
- * Voice Control Systems Recognition
- * VCS 2030 & 2060 Voice Dialer
-
- _Other Platforms_
- * Simon Says (NeXT)
- * Voice Command Line Interface (Amiga)
- * Visus SpeechKit
-
- _Unknown_
- * Berkeley Restaurant Project (BeRP)
- * Lernout & Hauspie ASR (3 products)
- * Voice-Trek 2.0
- * Voicetek Corp.
- * Voice Processing Corporation Speech Recognition Product Line
-
- Speech Recognition Processors (ICs)
-
- Jean-Pierre Lereboullet has put together a detailed list of Voice
- Recognition Processors which covers about 15 ICs and pieces of related
- hardware (including D6106, HM2007, MSM6679, RSC-164, TC8860F/64F/65F,
- 5A128).
- The document is available on the comp.speech ftp server:
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/VoiceRecognitionProce
- ssors
-
- Recognition Information on the WWW
-
- In addition to the entries on speech recognition in this FAQ, the
- following WWW sites provide information on speech recognition:
-
- Commercial Speech Recognition: Russ Wilcox of PureSpeech Inc.
-
- http://www.tiac.net/users/rwilcox/speech.html
-
- Macintosh Speech Resources and Apps
- http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
-
- Speech Recognition Information: 21st Century Eloquence
- http://www.voicerecognition.com/
-
- Applied Speech Technology Laboratory of CLSI at Stanford
- http://csli-www.stanford.edu/users/bscott/SRTech.html
-
- Speech Toys Speech Recognition Page
- http://www.speechtoys.com/spchtoys/sprec.html
-
- Speech recognition product lists: postings to comp.speech
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/SpeechRecognit
- ionProducts
-
- Search Alta Vista for Speech Recognition
-
- Search Lycos for Speech Recognition
-
- Yahoo pages on Speech Recognition
- http://www.yahoo.com/business/corporations/computers/software/v
- oice_recognition/
- http://www.yahoo.com/Science/Computer_Science/Artificial_Intell
- igence/Natural_Language_Processing/Speech_Recognition/
-
- Speech Recognition Resellers and Value-Added Services
-
- 1stVoice
- 2470 El Camino Real, Suite 110, Palo Alto CA 94306-1701
- Ph: 415-857-1320, Fax: 415-856-6996
- WWW: http://www.1stvoice.com/
- Email: mail@1stvoice.com
- Dragon Dictation Products
-
- 21st Century Eloquence
- 325-A Royal Poinciana Plaza, Palm Beach, Florida 33480, USA
- Ph: 800-245-2133, Fax: 407-835-4901
- WWW: http://www.voicerecognition.com/
- Kurzweil, IBM VoiceType, Dragon, Kolvox
-
- Auscript (Australia)
- Suite 2, Level 3, 60-70 Elizabeth St, Sydney, NSW 2000,
- Australia
- Ph: +61-2-238 6565, Fax: +61-2-238 6566
- WWW: http://www.auscript.com.au/
- Dragon Systems
-
- BRITE
- WWW: http://www.brite.com/
- Computer Telephony Integration & Interactive Voice Response
-
- DAX Systems, Inc.
- 30 Chapin Road, Unit 1201, P.O. Box 778, Pine Brook, NJ/USA
- 07058
- Ph: +1-201-227-8111, Fax: +1-201-227-8197
- Email: info@daxsystems.com
- WWW: http://www.daxsystems.com/
- Computer Telephony and Integrated Voice Response
-
- HealthCare Resources
- 1444 Aviation Blvd, #103, Redondo Beach, CA 90278, USA
- Ph: +1-310-937-5156, Fax: +1-310-937-5159
- EMail: Scalif@AOL.COM
- Power Secretary & Dragon Dictate. Specializing in:
- Medical/Dental, Motion Picture Industry, Carpal Tunnel related
- and Disabled Persons.
-
- O'Brien Resources
- Ph: (540) 347-4988 (Address unknown)
- Email: obrien@crosslink.net
- WWW: http://www.crosslink.net/~obrien/
- Kurzweil Voice Recognition Products
-
- SCI VoiceAutomated
- 215 1/2 Main Street, Huntington Beach, CA 92648, USA
- Ph: 800-597-6600, Ph: +1-714-969-7632, Fax: +1-714-969-0122
- http://www.voiceautomated.com/
- IBM VoiceType, Kurzweil Voice, DragonDictate and Philips
- speech.
-
- Synapse
- 3095 Kerner Blvd., Suite S, San Rafael, CA 94901, USA
- Ph: (415) 455-9700, Fax: (415) 455-9801
- Email: SYNAPSE_ADAPTIVE@msn.com
- WWW: http://www.synapseadaptive.com/
- Dragon Systems, Kurzweil and IBM products.
-
- Talk Technology
- Ph: 1-800-270-1672, Fax: 1-516-360-1213
- Email: info@talktechnology.com
- http://www.talktechnology.com/
-
- Talk Technology, Inc.
- Tel: +1-718-745-9199, Fax: +1-718-499-6480
- Email: mnm@pipeline.com
- WWW: http://www.usbusiness.com/talk/
- Dragon Dictate and portable (notebook) solutions
-
- ToppCopy Telecom
- Email: ffalzett@toppcopy.com
- WWW: http://www.toppcopy.com/
- Philips Digital Dictation
-
- VoiceWare Systems
- 230 California Street, Suite 410, San Francisco, CA 94111
- Ph: (415) 433-2001, Fax: (415) 433-6909
- Email: info@talk2type.com
- WWW: http://www.talk2type.com/home.htm
- IBM, Dragon Systems, Kurzweil Applied Intelligence, WildCard
- Technologies
-
- WorkLink
- A.D.A. Solutions by WorkLink
- 2566-A Telegraph Avenue, Berkeley, California 94704 USA
- Ph: 510-848-8363, Fax:510-848-7322
- WWW: http://www.worklink.net/
- Email: wayne@worklink.net
- Dragon Dictation Products
-
-
-
- AbbotDemo
-
- * Platform: SunOS4, IRIX, Linux, HU-UX
- * Description: Large vocabulary, speaker independent, continuous
- automatic speech recognition system. Uses recurrent neural
- networks and hidden Markov models with a 5,000 word vocabulary
- upgradable) and a trigram word grammar. Includes a front end for
- waveform capture and display (including spectrogram) and a
- graphical display of the phoneme representation as well as a
- rewriting display of the best guess word sequence.
- * Requirements: UN*X, X, 8 Mbyte free RAM, 486DX or faster
- processor, 16 bit soundcard, reasonable quality microphone and a
- copy of the Wall Street Journal newspaper.
- * Price: Free for non-commercial use
- * Availability: By anonymous ftp from
-
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/AbbotDemo
-
- * Note 1: This is not a complete system for dictation.
- * Note 2: At present there are no sources with this distribution.
- For sources for an earlier version see the recnet entry.
- * Note 3: Not supported.
- * Contact: AbbotDemo@compute.demon.co.uk
- Tony Robinson
- Cambridge University Engineering Department
- Trumpington Street, Cambridge, CB2 1PZ, UK
- Tel: +44-1223-332815 Fax: +44-1223-332662
-
-
-
- AT&T Watson Speech Recognition
-
- * Platform: Windows 95/NT on a Pentium 75 Mhz or higher
- * Description: Watson is a software implementation of AT&T Bell
- Laboratories voice processing technology. Watson includes BLASR
- Speech Recognition and FlexTalk speech synthesis (see Q5.5). It
- requires no special hardware to run other than a standard sound
- card and/or phone card. Technical details for BLASR Speech
- Recognition include:
- + Compliant with Microsoft Speech API and Telephone API
- + Speaker independent, continuous speech recognition
- + Fast, run-time vocabulary change
- + Open mic and telephone line environments
- + SoundBlaster compatible sound card and drivers required
- + Subword models and whole-word digit models
- + Background, silence, and filler/garbage models
- + 50 word name vocabulary or 100 word phrase real-time
- recognition with 95% accuracy
- + Rejection of out-of-vocabulary words
- + American English only - other languages in development
- + Barge-in speech begin/end notification - requires hardware
- echo cancellation
- The AT&T Advanced Speech Products Group home page provides more
- detailed information including a Frequently Asked Questions list,
- information for application developers on the Independent Software
- Vendor (ISV) Program (including info on the SDK, licensing, and
- the training program).
- * Requirements: Uses 2 MB RAM, 10 MB Disk. Requires a Pentium 75 MHz
- or higher CPU (uses
- * Cost and Availability: WATSON is a software-based speech platform
- with a Software Developers Kit (SDK) that allows application
- developers to use voice processing in their applications. It is
- not available as a stand-alone product.
- Licensing information (inc. price) is provided in the AT&T
- Advanced Speech Products Group home page
- * See also: Watson FlexTalk speech synthesis in Q5.5, Microsoft
- Speech API, and Advanced Speech API.
- * Contact: AT&T Advanced Speech Products Group
- Suite 700, 44 East Mifflin Street, Madison, WI 53703, USA
- Ph: 1-800-5-WATSON, Fax: 1-608-259-2269
- Email: aspg@attmail.com
- WWW: http://www.att.com/aspg/
-
-
-
- BBN Hark Telephony Recognizer
-
- * Platform: Available for Unix-based workstation and PC platforms
- including IBM RS6000/AIX and Pentium/SCO Unix.
- * Description: Large vocabulary (2,000+ words), speaker independent,
- continuous ASR software. Specifically designed for large scale
- telephony applications. Using a client/server architecture, all
- features and capabilities are integrated in one software product
- instead of on separate boards. Very memory efficient, the Hark
- Telephony Recognizer runs in as little as 2MB of physical memory.
- Multiple recognizers can be run on a single platform. Uses Hidden
- Markov Model and phoneme-based BBN recognition algorithms. An API
- is provided for integration with existing applications. A
- developer's toolkit is available.
- * Price and availability: Price varies depending on vocabulary size.
- Version 3.0 available immediately.
- * Misc: BBN Hark provides application design and human factors
- consulting services. Regular monthly training classes on
- developing speech-enabled applications are held at BBN Hark's
- Cambridge (Mass) headquarters.
- * WWW: For additional information see BBN Hark's home page.
- * Contact: BBN Hark Systems
- 70 Fawcett Street, Cambridge, MA 02138, USA
- Tel: 617-873-4636 Fax: 617-873-2473
- WWW: http://www.bbn.com/bbn_hark/HarkHome.html
-
-
-
- Berkeley Restaurant Project (BeRP)
-
- * Description: BeRP is a test bed for a speech recognition system
- being developed by the International Computer Science Institute in
- Berkeley, CA. BeRP is a medium-vocabulary, speaker-independent
- spontaneous continuous speech understanding system. BeRP functions
- as a knowledge consultant whose domain is the restaurants in the
- city of Berkeley. The system serves as a testbed for several
- research projects, including robust feature extraction,
- connectionist phonetic likelihood estimation, automatic induction
- of multiple pronunciation lexicons, foreign accent detection and
- modeling, advanced language models, and lip-reading.
- * Note: As far as I know the BeRP software is in-house software -
- that is, it is not made available for distribution.
- * More information: http://www.icsi.berkeley.edu/real/berp.html
-
-
-
- Cambridge Voice for Windows
-
- * Platform: Windows
- * Description: Speaker-independent recognition of continuous speech
- in real time. Vocabularies can range from small to very large
- (more than 60,000 word forms). Support is planned for languages
- including English, Danish, Dutch, French, German, Italian,
- Norwegian, Spanish, Swedish, and Japanese. The engine complies
- with the Microsoft Speech API.
- * Contact: Cambridge Group Research, Ltd.
- Box 7290, Buffalo Grove, IL 60089
- Ph: (708) 821-1040, Fax: (708) 821-1041
- E-mail: 76061.3350@compuserve.com
-
-
-
- CustomVoice and CustomTelephone: A&G Graphics Interface Inc.
-
- * Platform: Windows
- * CustomVoice: Speech recognition custom control for Visual Basic,
- Visual C++, Borland C++, and other development platforms that
- support *.VBX. Provides an engine/proprietary independent
- development platform for speech recognition. Currently supports
- ICSS, but should soon support other platforms. Includes a grammar
- debugger and parser APIs to parse spoken speech into useful data
- types.
- Requirements: 486/DX or better PC, Windows 3.1 or Windows for
- Workgroups, 8Mb RAM (minimum), SoundBlaster 16, microphone, and
- mouse. Supports Visual Basic, Visual C++, Borland C++, and Delphi.
- * CustomTelephone: Windows-based developers tool that allows
- programmers to build speech enabled "telephony" applications via
- standard custom control properties (VBX). It supports IBM
- VoiceType Application Factory (VTAF), a continuous speech, speaker
- independent speech recognizer, and supports voice response boards
- such as Dialogic. Comes with a VB custom control, pre-built
- grammar sets for common data types, an interactive grammar
- debugger to identify valid speech patterns, and parser API
- functions that convert recognized speech into data types supported
- by VB, C++ and Delphi. Includes sample applications with source
- code, and VBX, VCL and DLLs. Bundled with speech recognition
- engines.
- Requirements: 486/DX or better, Windows 3.1 or Windows for
- Workgroups, 8Mb RAM (minimum), SoundBlaster or compatible sound
- card, Dialogic D2X or D4X board, and mouse. Microphone and speaker
- optional. Supports Visual Basic, Visual C++, Borland C++, and
- Delphi.
- * Contact: A&G Graphics Interface
- 51 Gore Street, Cambridge, MA 02141-1213 , USA
- Ph: +1-617-492-0120, Fax: +1-617-427-2133
- Email: customvc@world.std.com
- CompuServe: 74774,273 CompuServe ( GO SPEECH )
- WWW: http://www.customvoice.com/
-
-
-
- DATAVOX - French
-
- * Platform: PC / DOS
- * Description: Continuous speech - speaker independent or dependent.
- * Requirements: 2 PC format boards (RdF1000 and TdS 96/25) and an
- A/D - D/A module (ASA116)
- * Misc: Application software may dialog with DATAVOX through 2 types
- of interfaces :
- + Keyboard overlay: The application software may be used with
- any PC compatible package. No specific adaptation is
- necessary, you only need to define your configuration with
- the application software.
- + C library: Allows a user-written program to drive the
- recognition system.
- DATAVOX is based on the AMADEUS speech recognition software
- developed at LIMSI. It provides
- + Continuous speech recognition with 500 words speaker
- dependent, 50 words speaker independent (custom-made
- vocabulary).
- + Grammar of the application language (syntax acquisition,
- verification and simplification software).
- + Large vocabulary : DATAVOX can recognize vocabularies of
- several thousand words as long as there are no more than 500
- words in the active vocabulary at any given node. It takes
- less than 1 second to change syntax and vocabulary.
- + Training controlled by the system (use of co-articulation
- models).
- + Response time less than 500 ms for any phrase length.
- + Synthetis (ADPCM) can be heard simultaneously while
- recognition is being carried out.
- * Contact: VECSYS
- Le Chene rond, 91570 Bievres, France
- Voice: 33 1 69 41 15 04, Fax: 33 1 69 41 24 30
-
-
-
- Digital Dreams Speech Recognition Plug-Ins
-
- * Platform: Apple Macintosh
- * Description (General): A suite of speech plug-ins for the
- interactive multimedia market which enable developers to quickly
- incorporate speech recognition into their titles without having to
- resort to a low-level programming language, such as C. Speech
- plug-ins bridge the gap between a speech recognition API, such as
- Apple's PlainTalk Speech Recognition technology, and
- authoring/development environments, such as Macromedia Director or
- HyperCard. Digital Dreams currently offers Macintosh speech
- plug-ins for Macromedia Director and HyperCard. Support for other
- environments, including AppleScript, Apple Media Tool, Authorware,
- and Windows is being developed. Currently available for North
- American Adult English. More information is available on the
- Digital Dreams WWW site.
- * ShockTalk: is a combination of Netscape, ShockWave and Speech
- Recognition technologies for the Power Macintosh and Quadra AVs
- that enables you to navigate web sites and hyperlinks using spoken
- commands as well as create shockwave movies that respond to spoken
- user interactions.
- * Requirements: Power Macintosh (PowerPC w/ MacOS)
- Microphone (PlainTalk compatible)
- PlainTalk Speech Synthesis and PlainTalk Speech Recognition
- Netscape Navigator
- * Contact: Digital Dreams
- 4308 Harbord Drive, Oakland, CA, 94618, USA
- Tel: (510) 547-6929 Fax: (510) 547-6799
- email: dreams@surftalk.com
- WWW: http://www.surftalk.com/
- FTP: ftp://ftp.surftalk.com/
-
-
-
- DragonDictate for Windows
-
- * Platform: Windows
- * Description: Information moved to the page on Dragon Dictation
- products including DragonDictate for Windows
-
-
-
- Dragon Dictation Products
-
- * Dragon NaturallySpeaking
- * DragonDictate for Windows
- * Dragon PowerSecretary
- * General Information
-
- Dragon NaturallySpeaking
-
- * Platform: Windows
- * Description: General purpose, continuous speech dictation system.
- Personal Edition has a 30,000 word active vocabulary and comes
- with a 200,000+ word pronunciation dictionary; users can also add
- their own words or phrases.
- More information on Dragon's NaturallySpeaking web site.
- * Requirements: 133Mhz Pentium, 32 MB RAM (Windows 95) or 48 MB RAM
- (Windows NT 4.0), supported sound card.
- * Price: see Dragon's NaturallySpeaking web site.
- * Related products: see general information below
- * Contact: see general information below
-
- DragonDictate for Windows
-
- * Platform: Windows
- * Description: Speech-to-text dictation system. Discrete dictation;
- continuous command/control; speaker-adaptive. Also provides mouse
- movement for hands-free operation of Windows. Comes with a 120,000
- word pronunciation dictionary; users can also add their own words
- or phrases. Dictate directly into any application. Available in US
- and UK English, French, Italian, German, Spanish, and Swedish.
- Add-on vocabularies for medicine, law, business and finance,
- computers and technology, journalism.
- Available as DragonDictate Singles Editions (10,000 words active),
- DragonDictate Personal Edition (10,000 words active),
- DragonDictate Classic Edition (30,000 words active), DragonDictate
- Power Edition (60,000 words active).
- Includes Office97 support.
- More information on the Dragon Systems web site.
- * Requirements: 486/66, 7-10 MB dedicated RAM (depending on
- edition), Windows 3.1x, NT 3.51, or 95.
- Supported sound boards: Creative Labs Sound Blaster 16, Microsoft
- Windows Sound System, IBM M-Audio Capture/Playback Adapter, many
- notebooks with built-in audio.
- See Dragon Systems Compatibility list for details.
- * Price: Check at the Dragon Systems web site.
- * Related products: see general information below
- * Contact: see general information below
-
- Dragon PowerSecretary
-
- * Platform: Apple Macintosh
- * Description: Speaker dependent/adaptive system requiring words to
- be separated by short pauses. Available as PowerSecretary Power
- Edition, Personal Edition, PowerSecretary MED for Healthcare
- Professionals.
- Vocabulary: 30,000 - 60,000 at any one time, automatically
- selected from 120,000-word dictionary.
- * Requirements: Power Macintosh 6100, 7100, 8100, Performa 6100
- series, Powerbook 540, 68040 class Macintosh such as Quadra 660AV,
- 700, 800, 840AV, 900, 950, Centris 650 and 660AV.
- Hard Disk with at least 25Mb free.
- System 7.5 or greater
- (Some systems require add-on hardware)
- * More information: PowerSecretary home page
- * Related products: see general information below
- * Contact: see general information below
-
- General Information
-
- Dragon Dictation Products
-
- * Dragon NaturallySpeaking
- * DragonDictate for Windows
- * Dragon PowerSecretary
- * General Information
-
- Dragon Developer Products
-
- * Dragon PhoneQuery
- * DragonXTools
- * Dragon SpeechTool
- * Dragon VoiceTools
-
- Related Web Sites
-
- * Simon Crosby's FAQ for DragonDictate
-
- Contact:
-
- * Dragon Systems, Inc.
- 320 Nevada Street, Newton, MA 02160, USA
- Tel: 1-617-965-5200 or 1-800-TALK-TYP
- Fax: 1-617-527-0372
- Email: info@dragonsys.com
- WWW: http://www.dragonsys.com/
- CompuServe: GO DRAGON
-
-
-
- Dragon Developer Tools
-
- * Dragon PhoneQuery
- * DragonXTools
- * Dragon SpeechTool
- * Dragon VoiceTools
-
- Dragon PhoneQuery
-
- * Platform: Windows NT
- * Description: Software for building voice response systems. Callers
- are able to do the following: Ask for information using completely
- natural and continuous language. Have a spoken dialog to fine tune
- a request. Request information to be faxed, sent by electronic
- mail, or read over the phone, using text-to-speech.
- More information on the Dragon Systems telephony pages.
- * Requirements: Pentium or Pentium Pro PC running Windows NT 4.0.
- Telephone interconnect requirements vary by application.
- * Related products: see general information below
- * Contact: see general information below
-
- DragonXTools
-
- * Platform: Windows
- * Description: VBX and OCX controls that allow an application to
- control DragonDictate's capabilities, ranging from small
- vocabulary command and control to customized large vocabulary
- dictation. More information is available on the Dragon Developer
- pages
- * Related products: see general information below
- * Contact: see general information below
-
- Dragon SpeechTool
-
- * Platform: Windows
- * Description: Create small, optimized vocabularies for your
- speech-enabled applications, or supplement DragonDictate's
- extensive built-in vocabularies with specialized terms and names.
- More information is available on the Dragon Developer pages
- * Related products: see general information below
- * Contact: see general information below
-
- Dragon VoiceTools
-
- * Platform: Windows, DOS
- * Description: integrate small-vocabulary speech recognition
- directly into your DOS and Windows 3.1x applications. More
- information is available on the Dragon Developer pages
- * Related products: see general information below
- * Contact: see general information below
-
- General Information
-
- Dragon Dictation Products
-
- * Dragon NaturallySpeaking
- * DragonDictate for Windows
- * Dragon PowerSecretary
- * General Information
-
- Dragon Developer Products
-
- * Dragon PhoneQuery
- * DragonXTools
- * Dragon SpeechTool
- * Dragon VoiceTools
-
- Related Web Sites
-
- * Simon Crosby's FAQ for DragonDictate
-
- Contact:
-
- * Dragon Systems, Inc.
- 320 Nevada Street, Newton, MA 02160, USA
- Tel: 1-617-965-5200 or 1-800-TALK-TYP
- Fax: 1-617-527-0372
- Email: info@dragonsys.com
- WWW: http://www.dragonsys.com/
- CompuServe: GO DRAGON
-
-
-
- EARS: Single Word Recognition Package
-
- * Platform: Linux and Unixs with the Voxware sound driver
- * Description: Intended as a limited ready-to-use single word
- recognizer. However, its design aims at being a platform for
- various kinds of methods used in speech recognition (SR). EARS is
- designed to be a flexible environment for recognition system
- components; for example, take this feature extractor and that
- recognizing method, and this list of words. New methods for single
- word recognition can be integrated easily, as EARS uses C++
- abstract base classes. You speak the words you want to be
- recognized later. Your utterances can be saved to RIFF WAV files
- for inspection, change or delete them before they are further
- processed to the pattern files on which the recognizer is finally
- trained. As of version 0.20, the feature extractors are:
- Rasta-PLP, PLP, LPC, Mel-Cepstrum. The implemented recognizers
- are: DTW and non-recurrent neural nets on fixed-size sound
- patterns.
- * Requirements: Soundcard with mic
- * Misc 1: The current version is an Alpha release.
- * Misc 2: For more information subscribe to the EARS mailing list.
- Send email to majordomo@phil.uni-sb.de with "subscribe ears-list"
- in the body.
- * Misc 3: Niels Thorwirth (thorwir@pi4.informatik.uni-mannheim.de)
- has made changes to Version 0.14 which support the AF audio server
- software (see Q1.11) and the OGI Speech Tools (see Q1.9) so that
- EARS is more portable to other UNIX platforms. Available by email
- to Niels.
- * Requirements: Soundcard with mic
- * Availability: Source and Linux binaries are available by anonymous
- ftp
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/ears-0.26.
- tar.gz
- ftp://sunsite.unc.edu/pub/Linux/apps/sound/speech/ears-0.26.tar.gz
- * Contact: Ralf W. Stephan: ralf@ark.franken.de
-
-
-
- Ficomp Interpreter 6000
-
- * Platform: DOS, Windows 3.1, Win95, Win NT, UNIX
- * Description: Ficomp Systems, inc., is a systems integrator that
- has developed commercial speaker-dependent, continuous-speech
- recognition applications for use in high noise environments on
- several platforms. Applications are specialized in the finance
- industry for exchange floors, banks and brokerage firms.
- * Contact: Ficomp Systems, Inc.
- Ph: (732) 274-2600, Fax: (732) 274-2601
- 117 Docks Corner Road, Dayton, NJ 08810
- E-Mail: fsisales1@aol.com
- WWW: http://www.ficompsystems.com/
-
-
-
- HM2007 - Speech Recognition Chip
-
- * Platform: Intergrated circuit.
- * Description: HM2007 is a 48-pin single chip CMOS voice recognition
- LSI circuit with on-chip analog front end, voice analysis,
- recognition process and system control functions. A 40 word
- isolated-word voice recognition system can be composed of an
- external microphone, keyboard, SRAM and a few other components.
- When combined with a microprocessor, an intelligent recognition
- system can be built. A demo board for this chip is being
- distributed by The Summa Group.
- * Cost: Approx US$16 for the HM2007 and US$160 for the demo board.
- * Misc: Jean-Pierre Lereboullet's document on Voice Recognition
- Processors provides additional information on the HM2007.
- * Producer: HUALON Microelectronic Corp. USA
- Tel: (415) 288 0390 Fax: (415) 288-0399
- * Distributor 1: Marywale Engineering Company
- Tel: (602) 247 4451 Fax: (602) 247 6167
- Email: meco@indirect.com
- * Distributor 2: The Summa Group Limited
- One California Street, Suite #1940,
- San Francisco, CA 94111
- Ph: (415) 288-0390
- * Distributor 3: Images Company
- 39 Seneca Loop, Staten Island, NY 10314, USA
- Ph: +1-718-698-8305, Fax: +1-718-982-6145
- Sells single piece quanities of HM2007 48Pin Dip Chip and HM2007
- 52 Pin PLCC style chip. Sells HM2007 Demo Kits unassembled $100.00
- and assembled $135.00 (using 48 Pin dip chip)
-
-
-
- Entropic's HTK (HMM Toolkit)
-
- * Platform: Range of Unix platforms.
- * Description: HTK is a software toolkit for building continuous
- density HMM based speech recognisers. It consists of a number of
- library modules and a number of tools. Functions include speech
- analysis, training tools, recognition tools, results analysis, and
- an interactive tool for speech labelling. Many standard forms of
- continuous density HMM are possible. Can perform isolated word or
- connected word speech recognition. It van model whole words, sub-
- word units. Can perform speaker verification and other pattern
- recognition work using HMMs. HTK is now integerated with the
- ESPS/Waves speech research environment which is described in
- Section 1.9.
- * Misc 1: The availability of HTK changed in early 1993 when
- Entropic obtained exclusive marketing rights to HTK from the
- developers at Cambridge.
- * Misc 2: More detailed information on HTK is available from the
- Entropic WW server: http://www.entropic.com/htk.html
- * Cost: On request.
- * Contact:
-
- Entropic Research Laboratory,
- 600 Pennsylvania Ave, S.E. Suite 202,
- Washington, D.C. 20003, USA
- Phone: (202) 547-1420.
- email - info@entropic.com
- WWW: http://www.entropic.com/
-
-
-
- IBM VoiceType Dictation
-
- * Platform: OS/2 and Windows
- * Description: IBM VoiceType Dictation supports speech input at
- 70-100 words a minute and can be used to control your desktop and
- applications. Isolated-word, speaker-dependent system using a
- speech adapter card. Available for U.S. English, U.K. English,
- French, German, Italian, Spanish and Arabic. Provided with a
- general office vocabulary and support for major OS/2 and Windows
- applications. Additional specialised vocabularies are available:
- + US: Legal, Emergency Medicine, Radiology and Journalism
- + UK: Legal
- + IT: Radiology
- * Requirements: See
- http://www.software.ibm.com/workgroup/voicetyp/vtprod13.html
- * Cost: See
- http://www.software.ibm.com/workgroup/voicetyp/vtordna.html
- * Misc: An IBM VoiceType Dictation FAQ is supported by UltraMedia
- Systems International (a distributor of IBM VoiceType):
- http://www.infi.net/~ums/ibmfaq.htm
- * Demo software: Available on the IBM WWW site:
- http://www.software.ibm.com/workgroup/voicetyp/vtcust1.html
- * Contact: US Ph: 1-800-TALK-2-ME or 1-914-766-1900.
- Email: talk2me@vnet.ibm.com
- WWW: http://www.software.ibm.com/workgroup/voicetyp/vtcust1.html
-
- IBM VoiceType Control (US Only)
-
- * Platform: OS/2 and Windows
- * Description: VoiceType Control is a speech recognition navigator
- that lets you control programs by speaking. VoiceType Control
- converts voice commands to keystroke macros. The program provides
- speaker independent, continuous speech recognition, so you do not
- have to train the program for your specific speech patterns.
- * Requirements: ?
- * Cost: ?
- * Demo software:
- http://www.software.ibm.com/workgroup/voicetyp/vtcust2.html
- * Contact: US Ph: 1-800-TALK-2-ME or 1-914-766-1900.
- Email: talk2me@vnet.ibm.com
- WWW: http://www.software.ibm.com/workgroup/voicetyp/vtcust2.html
-
-
-
- IN CUBE
-
- * Platform: Three versions for Windows 95, Windows NT and Sun
- SPARCstations
- * IN CUBE for Windows 95: Developed for general purpose Windows 95
- users. It is packaged for online distribution with a full working
- demo and an option to register and unlock the full product. The
- system uses Command Corp's Mark II continuous speech recognition
- engine and handles changable lexicons of up to 75 commands.
- + Price: $49.95 US
- + Requirements: 386/25MHz processor or better, Microsoft
- Windows 3.1 or later, Windows compatible sound card or
- built-in audio, and microphone.
- + Availability: http://www.commandcorp.com/cci/win95.html
- Demo mode available.
- * IN CUBE Mark II Pro for Windows NT: IN CUBE is a continuous
- realtime speech recognition system developed to provide a fast and
- convenient means of window navigation and voice macro command
- input for command intensive applications like CAD and publishing.
- Speaker-dependent training and ability to add new commands and
- macros.
- + Price: $495 including the PRO 8 microphone. $540 including
- the MT 858 desk microphone.
- + Requirements: Windows NT, Windows NT-compatible audio board
- (16-bit audio recommended).
- + Availability: http://www.commandcorp.com/cci/pront.html
- Demo available.
- * IN CUBE Voice Command for Sun SPARCstations: Provides continuous
- realtime speech recognition system for window navigation and voice
- macro command input to the workstation. Speaker-dependent training
- and ability to add new commands and macros.
- An IN CUBE Application Programming Interface is available with a
- library of linkable object modules is available for developers.
- + Price: $495 per seat. The developer's API sells for $695.
- + Requirements: SUN OS 4.1.x or Solaris 2.x with OpenWindows
- and Motif. Works with all audio-equipped SPARCs and clones.
- Models range from SPARCStation 1s to SPARCStation 20s.
- + Availability: http://www.commandcorp.com/cci/in3sparc.html
- A free 5 day evaluation license is available.
- * Contact: Command Corp. Inc.,
- 3761 Venture Drive, PO Box 956099, Duluth, Georgia, 30136, USA
- Ph: +1-770-813-8030
- Email: in3@commandcorp.com
- WWW: http://www.commandcorp.com/incube_welcome.html
-
-
-
- Jialong He's Speech Recognition Research Tool
-
- * Platform: SUN SPARC (SunOS), PC (MSDOS)
- * Description: This is a speech recognition research tool. it
- contains a feature extraction program and three speech
- recognizers: a DTW recognizer, discrete didden Markov model (DHMM)
- based recognizer and Continuous density hidden Markov mode (CHMM)
- with Gaussian mixture functions based recognizer. The utilities
- are grouped as:
- + feature -- extract featue vectors from a speech signal (MFCC
- etc.)
- + dtwcmp -- dynamic time-wapping (DTW) comparision.
- + gensym -- turn vector sequences to discrete observation
- symbols.
- dhmm -- discrete HMM training program.
- dtest -- DHMM companion test program.
- + chmm -- continuous density HMM training program.
- viterbi -- CHMM companion test program.
- Note, this is a research tool not a complete speech recognition
- system.
- * Availability: By anonymous ftp:
-
- MSDOS Version
- UK:
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
- pchtool.zip
- Germany:
- ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spchtool.z
- ip
-
- Sun SPARC version, compiled with GNU C
- UK:
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
- pch_sun_v1.tar.gz
- Germany:
- ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speech_sun
- _v1.tar.gz
-
- * See also: Jialong He's Speaker Recognition (Identification) Tool
- * Contact: Jialong He
- email: jialong@neuro.informatik.uni-ulm.de
-
-
-
- Kurzweil Voice for Windows
-
- * Platform: Windows 3.1 or later
- * Description: Kurzweil Voice for Windows is a dictation product
- enabling the user to create text and enter data by speaking to
- Windows-based applications. System is adaptive but requires no
- initial training. Users can choose either 30,000 or 60,000 word
- active vocabulary. Application command translation templates for
- popular Windows application such as WordPerfect, 1-2-3, Organizer,
- Word (30+ applications are listed on the Kuzweil WWW pages). More
- detailed information is available on the Kurzweil WWW pages.
- * Requirements: 486DX/33 or higher, 8 or 16 MB dedicated memory
- (depends on vocabulary, 30 MBs dedicated disk space, VGA or
- higher, Kurzweil-supplied microphone and DSP board.
- * Contact:
- Kurzweil Applied Intelligence, Inc.
- 411 Waverley Oaks Road, Waltham, MA 02154 USA
- Phone: 1-800-380-1234
- Email: info@kurzweil.com
- WWW: http://www.kurzweil.com/
-
- Kurzweil Clinical Reporter
-
- * Platform: Windows 3.1 or later
- * Description: Kurzweil Clinical Reporter is a voice-activated
- clinical reporting system for computer-based patient records. The
- family of products includes:
- + VoiceEM for emergency medicine
- + VoiceEM/TR for triage reporting
- + VoiceRAD for diagnostic imaging and radiology
- + VoicePATH for surgical and anatomical pathology
- + VoiceMED for Primary Care for family medicine, internal
- medicine and pediatrics
- + VoiceORTHO for office-based orthopaedic surgery
- + VoiceCATH for invasive cardiology
- + VoiceReport for general reporting
- * More information: from the Kurzweil WWW pages:
- http://www.kurzweil.com/medical/
- * Contact:
- Kurzweil Applied Intelligence, Inc.
- 411 Waverley Oaks Road, Waltham, MA 02154 USA
- Phone: 1-800-380-1234
- Email: info@kurzweil.com
- WWW: http://www.kurzweil.com/
-
-
-
- Lernout & Hauspie ASR 1000/T and 1000/M
-
- [Note: L&H asr200/A is described below.]
-
- * L&H asr1000/T: ASR for the Telephony and Telecommunications Market
- * L&H asr1000/M: TTS for the Computer and Multimedia Market
-
- * Description: Automatic speech recognition software providing
- continuous speech recognition, isolated word recognition, keyword
- spotting or continuous digits recognition. The engine is speaker
- independent, and phoneme-based with optimization for commonly used
- words. General features include:
- + Languages available: US English, German, French, Spanish
- (Castilian), Dutch.
- + Available vocabulary: >100,000 words.
- + Line adaptation.
- + Rejection of out of vocabulary/grammar words.
- + N-best alternatives for isolated word recognition and keyword
- spotting.
- + Push to talk.
- * asr1000/T
- + Single channel platform examples: Motorola 56156, TI
- TMS320C2X/C3X/C5X
- + Multi-channel platform examples: TI TMS320C3X/C5X, AT&T
- DSP32C/3210, Motorola 96000
- + Input: 8 kHz telephone sampling
- * asr1000/M
- + Single processor platform examples: Intel 486/Pentium
- + Input: 8 kHz telephone or 11 kHz microphone sampling
- * See also: L&H ASR SDK for Windows
- * More Information: on the Lernout & Hauspie WWW pages:
- http://www.lhs.com/asr.html
- * Cost: Unknown
- * Contact: Lernout & Hauspie Speech Products
- 800 West Cummings Park, Suite 3100
- Woburn, MA 01801, USA
- Tel: (617) 238 0960
- Fax: (617) 238 0986
- Email: sales@lhs.com
- WWW: http://www.lhs.com/
-
- Lernout & Hauspie ASR 200/A for the Automotive and Industrial Market
-
- * Description: Automatic speech recognition software providing
- isolated word recognition, keyword spotting and alphabet
- recognition (optional). This engine is robust, speaker independent
- and word based. Other features:
- + Vocabulary: 100 words US English
- + Voice activation detection
- + Response time
- + Platform examples: Analog Devices ADSP2101/5
- + Input: 8 kHz telephone or microphone sampling
- * See also: L&H ASR SDK for Windows
- * More Information: on the Lernout & Hauspie WWW pages:
- http://www.lhs.com/asr.html
- * Cost: Unknown
- * Contact: Lernout and Hauspie Speech Products
- 20 Mall Road, 4th Floor
- Burlington, MA 01803, USA
- Ph: +1-617-238-0960, Fax: +1-617-238-0986
- Email: sales@lhs.com
- WWW: http://www.lhs.com/
-
-
-
- Lernout & Hauspie ASR SDK
-
- * Platform: Windows
- * Description: Windows based Software Development Kits are available
- for integrating automatic speech recognition technology with
- Windows based PC applications.
- * Requirements: IBM-compatible 486 DX/33 MHz + 8 MB RAM + MS DOS 5.0
- + MS Windows 3.1 (or higher) + Sound Blaster compatible sound
- board.
- * See also: L&H ASR Products
- * More Information: on the Lernout & Hauspie WWW pages:
- http://www.lhs.com/asr.html
- * Contact: Lernout and Hauspie Speech Products
- 20 Mall Road, 4th Floor
- Burlington, MA 01803, USA
- Ph: +1-617-238-0960, Fax: +1-617-238-0986
- Email: sales@lhs.com
- WWW: http://www.lhs.com/
-
-
-
- Listen for Windows 2.0 from Verbex Voice Systems
-
- * Platform: Windows
- * Description: Listen for Windows Version 2.0 is a Speaker
- Independent software product that provides continuous speech
- recognition for Windows applications. The product works with most
- industry standard sound cards and PCs with inbedded audio chips.
- Listen for Windows comes with over 16,000 commands in speech
- interfaces for over 40 software applications, such as MS Office,
- Lotus SmartSuite,Quicken, etc. The Listen Command Editor allows a
- user to change or add commands to existing speech interfaces or
- create new speech interfaces for most Windows applications.
- More detailed information is available on the Verbex Listen for
- Windows page.
- Verbex also sells Verbal Advantage Voice Browser for controlling a
- web browser, Verbal Advantage DeskTop for controlling desktop
- applications.
- * Requirements: 486/25SX PC or higher
- * Pricing and Availbility: See the Verbex ordering page for pricing.
- Verbex products are available over the web or can be shipped.
- Microphones available from Verbex.
- * Demo: A "Freeware" demo is available from the Verbex WWW site demo
- page.
- * Contact: Verbex Voice Systems
- 1090 King Georges Post Rd., Bldg 107, Edison NJ 08837, USA
- Ph: 1-800-ASK-VRBX, (908) 225-5225, Fax:(908) 225-7764
- WWW: http://www.verbex.com/
-
-
-
- Lotec Speech Recognition Package
-
- * Platform: Sun
- * Description: Public domain speech recognition software. Operates
- from input in Sun audio format (.au files) and outputs word
- hypotheses and time labelling data. The software includes programs
- to collect speech samples, a labeller, a "featurizer" which
- parameterises speech files, a word spotter and the recogniser. The
- software can real time recognition on a Sparc 10 for small
- vocabularies.
- * Requirements: Sun SPARC audio input and a "decent" microphone Sun
- multimedia demo software (in /usr/demo/SOUND) and X.
- * Availability: By anonymous ftp
- ftp://ftp.sanpo.t.u-tokyo.ac.jp/pub/nigel/lotec/lotec.tar.Z
- * Contact: Nigel Ward: _nigel@sanpo.t.u-tokyo.ac.jp _
-
-
- Macintosh Speech Recognition Manager
-
- * Platform: Macintosh
- * Description: supports developers who wish to add speech
- recognition to existing Macintosh applications. Provides speaker
- independent recognition and robustness to noise. Apple's Speech
- home page provides developer information and the complete speech
- recognition and synthesis synthesis SDKs. The recognition SDK
- includes samples code, control panels, interfaces, documentation
- and the recognizer.
- * Availability: under licensing conditions from the Macintosh Speech
- Developer's page
- http://www.speech.apple.com/speech/dev/dev.html.
- * Requirements: Power Macintosh with 16-bit sound, System 7.5, and a
- PlainTalk Microphone or equivalent
- * Cost: Free
- * See also: Macintosh Plaintalk and Speech Manager (Q5.5).
- * Note: Check out Kevin Lenzo's list of Macintosh Speech
- Applications.
- * Contact: Apple Computer, Inc.
- 1 Infinite Loop, Cupertino, CA 95014, USA
- WWW: http://www.speech.apple.com/
- Email: PlainTalk@atg.apple.com
-
-
-
- Microsoft Speech Recognition
-
- Microsoft Dictation Research Demonstration
-
- * Platform: Windows 95 or Windows NT 4.0
- * Description: A free demonstration of research technology that
- enables a computer to transcribe what you speak into Windows
- applications such as email and word-processors. Features of the
- demo software include:
- + 60,000 word vocabulary with the ability to add new words
- + High recognition accuracy
- + Works with any Windows 5application
- + "Dictation Pad" provides enhanced dictation features
- + "IntelliSense" converts spoken numbers and times
- automatically
- + Compatible with the Microsoft Speech API
- * Requirements: Windows 95 or Windows NT 4.0, Pentium 90 or better
- (RISC builds are available), 16 megabytes of RAM on Windows 95,
- Sound card with 16 kHz 16 bit input signals, High quality
- close-talk microphone, Speakers.
- * Availability: Free demo software is available at:
- http://www.research.microsoft.com/research/srg/install.htm
- * More information: http://www.research.microsoft.com/research/srg/
-
- Microsoft Command and Control Engine
-
- * Platform: Windows 95
- * Description: Provides command and control speech recognition using
- SAPI (the Microsoft Speech API) and "Whisper", Microsoft's speech
- recognition technology. Features include:
- + Speaker independent, continuous, sub-word modeling, context
- free grammars
- + Has its own letter-to-sound rules means it can recognize any
- words in a grammar.
- + North American English
- + PC microphone and telephone speech recognition with high
- performance
- + Word spotting option
- + Results objects containing top-N choices, segmentation, and
- confidence
- + Written to SAPI, the Microsoft Speech API.
- * Requirements: Windows 95 or Windows NT 4.0, Pentium 60 or better.
- (RISC builds are available), 1.5 megabyte working set, 16 kHz or 8
- kHz input signals, 6 megabytes on disk, Requires Microsoft Speech
- SDK to use.
- * Availability: Free demo software is available at:
- http://www.research.microsoft.com/research/srg/install.htm
- * More information: http://www.research.microsoft.com/research/srg/
-
-
-
- Myers' Hidden Markov Model software
-
- * Platform: Unix
- * Description: Hidden Markov model software for automatic speech
- recognition. C++ code that implements a basic left-right hidden
- Markov model and corresponding Baum-Welch (ML) training algorithm.
- It is meant as an example of the HMM algorithms described by
- L.Rabiner and others. The code was built in order to learn how HMM
- systems work and we are now offering it to the net so that others
- can learn how to use HMMs for speech recognition. Keep in mind
- that ease of understanding was our primary concern, not
- efficiency. The code can be used to build an experimental speech
- recognition systems using "train_hmm" and "test_hmm", and can be
- used in conjunction with written tutorials on HMMs to understand
- how they work.
- * Availability: By anonymous ftp from the comp.speech archive site.
- There are two files in the directory
- + ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/
- The files are
- + hmm.README
- + hmm-1.03.tar.gz
- * Contact: Richard Myers: rmyers@isx.edu
-
-
-
- NCC Dictate
-
- * Platform: Windows
- * Description: NCC Digital DictateTM is an add-on, enhanced
- interface for use with IBM's VoiceType(TM) Dictation for Windows
- and various Windows 3.1 applications (e.g. MS Word, WordPerfect).
- Digital DictateTM provides faster corrections and dictation rates
- and various other features. This version is not a stand alone
- product; it requires VoiceTypeTM Dictation to provide the speech
- recognition engine and the Windows application. Features include:
- + Direct dictation into Windows applications with access to all
- functions while dictating.
- + Versions for MS Word, WordPerfect, Ami Pro, and other Windows
- applications.
- + Speech enabled editing.
- + Capability to save speaker models and defer corrections.
- + Microphone "pause and restore" functions controlled with
- speech commands.
- + Add-on vocabularies for legal, medical, science and business.
- + SWITCH-ITTM foot pedal control or CardSwitchTM infrared
- wireless control available which switch between dictation and
- proofing/correction modes.
- * Requirements: IBM's VoiceTypeTM Dictation for Windows; a computer
- system meeting VoiceTypeTM Dictation for Windows requirements;
- VoiceTypeTM Dictation Adapter.
- * Availability: Through computer dealerships.
- * Price: $US295
- * Contact: NCC Incorporated
- 5808 E. Turquoise, Scottsdale, AZ 85253
- Ph: (602) 922-6236 Fax: (602) 596-9050
-
-
-
- NICO Artificial Neural Network Toolkit
-
- * Platform: UNIX (ANSI C source code)
- * Description: The NICO Toolkit is an artificial neural network
- toolkit specifically designed and optimized for automatic speech
- recognition applications. Networks with both recurrent connections
- and time-delay windows are easily constructed. The network
- topology is flexible -- any number of layers is allowed and layers
- can be arbitrarily connected. Tools for extracting input-features
- from the speech signal are included as well as tools for computing
- target values from standard phonetic label-files.
- * Availability: Through the NICO homepage
- (http://www.speech.kth.se/NICO/index.html)
- or the download page.
- * Contact: Nikko Strom, nikko@speech.kth.se
-
-
-
- Nuance Speech Recognition System
-
- * Platform: UNIX-based workstations including Sun and SGI.
- * Description: The Nuance Recognizer features client-server
- architecture with multiple recognizers available on a single
- processing platform. Primarily developed for telephony-based
- applications, the system accepts speaker-independent, continuous
- speech and supports very large vocabularies. Included is a
- "template matching" natural language capability for identifying
- the meaning of speech. A toolkit is available for use in
- developing a wide variety of speech recognition applications.
- * Price and availability: Contact Nuance
- * Contact: Nuance Communications
- 1380 Willow Road, Menlo Park, CA 94025, USA
- Ph: +1-650-847-0000, Fax: +1-650-847-7979
- WWW: http://www.nuance.com/
-
-
-
- OKI VRP6679 - Voice Recognition Processor
-
- * Platform: Intergrated circuit.
- * Description: Speech recognition IC. 25 words max. Speaker
- independent recognition capability. Recognition rate quoted as 97%
- in a noisy environment (e.g. a car).
- * Misc: Alias MSM6679
- * Misc 2: More information is provided in Jean-Pierre Lereboullet's
- document on Voice Recognition Processors.
- * Cost: Approx US$20. Demo board $876
- * Availability: OKI Semiconductor and OKI Distributors
- Corporate Headquarters
- 785 North Mary Avenue, Sunnyvale, CA, 94086 2909
- Tel: (408) 720 1900, Fax: (408) 720 1918
-
-
-
- Phonetic Engine 500 (PE500) from Speech Systems, Inc.
-
- * Platform: Windows
- * Description: Speaker independent, 40,000 word vocabulary,
- continuous speech recognition for MS Windows. Grammars with high
- perplexity possible. Includes noise rejection. Uses proprietary
- DSP board.
- * Cost: Prices in US$ - quantity one. The PE500 SDK is $995.00
- including board, microphone, and runtime software. Runtime only is
- $595.00. SpeechWizard(r) adds speech input to existing Windows
- applications, $295.00. Two-day training: $295.00 with purchase,
- $595.00 without.
- * Misc: The user defines the grammar of allowed utterances and must
- write software to invoke the board driver functions that control
- recognition. The user must also write software to
- collect/parse/interpret the ASCII text strings returned when
- recognition succeeds.
- * Misc 2: SSI now offers speech application development services.
- * Contact:
-
- Speech Systems, Inc.
- 2945 Center Green Court South
- Boulder, CO 80301-2275, USA
- Tel: 303.938.1110 Fax: 303.938.1874
- http://www.speechsys.com
-
-
-
- Philips Speech Recognition (2 products)
-
- SpeechMagic: Dictation
-
- * Platform: Windows 3.1 and higher
- * Description: A continuous speech recognizer providing a 64,000
- word vocabulary, speaker adaptation and multiple languages.
- SpeechMagic is currently available for English and German.
- SpeechMagic acts as a server application, processing speech input
- and providing text output. Uses an add-on ISA compatible
- recognition accelerator board. SpeechMagic provided a correction
- editor, editing and playback of recordings, and a vocabulary
- manager for entering new words, abbreviations, macros and special
- transcriptions (e.g. for foreign words). Windows DDE support and a
- native API are provided for integration.
- * Hardware Requirements: IBM compatible personal computer (486DX/ 66
- MHz or higher), minimum 16 MB of RAM, hard disk capacity > 500 MB,
- and a Philips LFH 6210 Accelerator Board.
- * More Information: For more information visit the SpeechMagic WWW
- page or the Philips Speech home page.
-
- Speech Processing System 6000s (Europe only)
-
- * Description: Dictation of medical findings using continuous speech
- recognition. Designed for German speaking radiologists and
- encompasses the complete radiology vocabulary. The authors use
- dictation stations (PCs) which are fitted with microphones. The
- transcriptionists use editing stations (also PCs) which are
- additionally fitted with headphones and footswitches. The SP6000s
- has a single speech recognition unit serving all users, and it
- offers automatic data transfer as well as the advantages of
- digital dictation functions. For more information visit the
- Philips SP6000s WWW page.
- * More Information: For more information visit the Philips SP6000s
- WWW page or the Philips Speech home page.
-
-
-
- Dragon PowerSecretary
-
- * Platform: Apple
- * Description: Information moved to the page on Dragon Dictation
- products including Dragon PowerSecretary
- (Previously Articulate PowerSecretary.)
-
-
-
- ProNotes Voice Tools
-
- * Platform: Windows
- * Description: ProNotes Voice Tools are designed to bring the speech
- recognition capabilities of the IBM VoiceTypeTM Dictation System
- for Windows into any program without the need for the programmer
- to directly interface with the speech engine at the API level.
- There are five tools, as described below, which are all available
- in three forms: Visual Basic(TM) Custom Controls (known as VBXs),
- 16-bit OLE Custom Controls, and 32-bit OLE Custom Controls. The
- tools are intended for use by Windows(TM) developers working with
- Windows 3.1(TM), Windows for Workgroups 3.11(TM), Windows NT 3.51
- Workstation(TM), and Windows 95(TM). The custom controls can be
- utilized with any application development environment which
- supports the use of such controls (e.g. Visual Basic and Visual
- C++).
-
- Playback and Record
- An object which allows developers to use the IBM Speech
- Engine to record and play back sound files. Can be used
- to add voice prompts and to allow end users to record and
- playback sound files.
-
- Voice Button
- An object having standard button properties and behavior,
- which can additionally be controlled by voice. The button
- can also be used as a label or a 3D panel.
-
- Dictation Window
- A text box that allows free dictation, voice macro
- utilization, and correction by voice. Each Dictation
- Window has access to global and context sensitive
- vocabularies for both command and dictation. There are
- three correction modes.
-
- Voice List Box
- Has standard list box properties and behavior, but can
- additionally be controlled by voice. A user can select
- items by pronouncing the entry's text or the entries can
- be numbered and selected accordingly.
-
- Voice Navigator
- Provides navigation by voice within an application
- developed with the Voice Tools, between voice-enabled
- objects described above, as well as some standard objects
- found within the application.
-
- * Requirements: Hardware: 80486/33 DX or higher, 60MB hard disk
- space for IBM VoiceType Dictation software, 10MB hard disk space
- for ProNotes Voice Tools, 3.5" floppy, VGA (or compatible), 16MB
- RAM, IBM VoiceType Dictation adapter, microphone, and speakers.
- Software: DOS version 6.0 or later, with SHARE.EXE running,
- Windows 3.1 or later, IBM VoiceType Dictation software, any
- programming environment or system compatible with Visual Basic or
- OLE Custom Controls.
- * Price: Unknown
- * Contact: Pronotes, Inc.
- 1546 Magee Avenue, Philadelphia, PA 19149, USA
- Ph: 800-70-NOTES or +1-215-533-8569, Fax: +1-215-533-1276
- Email: proinfo@pronotes.com
- WWW: http://www.pronotes.com/
-
-
-
- PureSpeech 2.0 Recognition Engine
-
- * Platform: Windows 3.1, Windows 95, Unix, Dialogic Antares DSP
- * Description: Speaker-independent, continuous speech, large active
- vocabulary speech recognition engine for American English, UK
- English, French, German and Spanish. Permits on-the-fly additions
- to the vocabulary using phonetic models and telephone or wideband
- microphone input. Flexible grammar, natural language processing,
- discourse models. Software only with a small RAM/CPU footprint.
- Can be used as a voice user interfaces (VUI's) for PC software
- applications. Can also be used for high-volume call center
- telephony, especially in banks, finance and other specialized
- applications.
- A toolkit for the Dialogic Antares is available.
- * Availability: PureSpeech is not available as a stand-alone
- product. It is available embedded in Windows-based software or as
- a toolkit.
- * Contact: PureSpeech, Inc
- 100 Cambridge Park Drive, Cambridge, MA 02140, USA
- Ph: (617) 441-0000 Fax: (617) 441-0001
- Email: amy@speech.com
- WWW: http://www.speech.com/
-
-
-
- recnet
-
- * Platform: UNIX
- * Description: Speech recognition for the speaker independent TIMIT
- and Resource Management tasks. It uses recurrent networks to
- estimate phone probabilities and Markov models to find the most
- probable sequence of phones or words. The system is a snapshot of
- evolving research code. There is no documentation other than
- published research papers. The components are:
- + A preprocessor which implements many standard and many non-
- standard front end processing techniques.
- + A recurrent net recogniser and parameter files
- + Two Markov model based recognisers, one for phone recognition
- and one for word recognition
- + A dynamic programming scoring package. The complete system
- performs competatively.
- * Cost: Free
- * Requirements: TIMIT and Resource Management databases
- * Contact: Tony Robinson: _ajr@eng.cam.ac.uk_
- * Availability: by anonymous ftp
-
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/r
- ecnet-1.3.tar.Z
-
-
-
- Sensory Inc. Integrated Circuits
-
- * Platform: Integrated circuits
- * Description: Sensory's low cost high quality Interactive Speech
- line of speech recognition IC's are designed for consumer
- telephony products, portable consumer electronics, and other
- consumer applications. Technologies available include speech
- recognition (speaker-independent and speaker-dependent), speaker
- verification, speech/music synthesis, digital record/playback, and
- general product control on one chip. Development tools and
- demonstration units are available. Detailed product information on
- the Interactive Speech chips is available from the Sensory
- Circuits WWW site.
- * Contact: Sensory, Inc.
- 521 E. Weddell Drive, Sunnyvale, CA 94089
- Ph: +1-408-744-9000, Fax: +1-408-744-1299
- Email: Sales@SensoryInc.com
- WWW: http://www.sensoryinc.com/
-
-
-
- Simon Says (NeXT)
-
- * Platform: NeXT
- * Description: Provides the ability to link commands to spoken
- phrases.
- * Availability:By anonymous ftp.
- Simon Says demo
- ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio
- /audio-apps/SimonSaysDemo.1.5.1.N.b.tar.gz
- Readme file
- ftp://ftp.informatik.uni-muenchen.de/pub/comp/platforms/next/Audio
- /audio-apps/SimonSaysDemo.1.5.1.README
- * Contact: Metrosoft
- 710 13th Street, Suite 310 X, San Diego, California 92101
- Ph: 619.488.9411 Fax: 619.488.3045
- Email: info@metrosoft.com [NeXTmail welcome]
-
-
-
- smARTspeak from Advanced Recognition Technologies, Inc.
-
- * Platform: Windows, Windows 95, DOS, and General Magic
- It also works on the following Processors/Microcontollers: Intel's
- 80 x 86, Intel's 8031, 8051, Motorola's 68000, and Hitachi's SH1,
- SH3, SH8.
- * Description: smARTspeak is suited to voice command and control
- applications, such as voice dialing in cellular and desktop
- telephones, or voice command operation in computers and multimedia
- products. It uses a compact (10KB size on 16 bit machines), fast,
- user dependent recognition engine.
- smARTspeak can recognize any language in any accent.
- ART recently completed a Software Developer Kit (SDK) for
- smARTspeak, running under Windows 3.1 or higher which allows the
- voice recognition engine to be used within Windows Applications.
- More detailed information on smARTspeak and the SDK is available
- on the ART WWW pages.
- * Availability: Currently liscensed to other equipment manufacturers
- (OEMs), system integraters, software, and application developers,
- and value added resellers (VARs) who port are technology into
- their product.
- * Contact: Advanced Recognition Technologies, Inc.
- International Office:
- 43 Brodezky Street, POB 39918, 61398 Tel Aviv, lsrael
- Ph: 972-3-642-7242, Fax: 972-3-642-5887
- Email: 100274.3223@Compuserve.com
- WWW: http://www.artcomp.com/
- US Office:
- 9574 Topanga Canyon Blvd. Chatsworth, CA 91311, USA
- Ph: 818-678-3999, Fax: 8181-678-3994
- WWW: http://www.artcomp.com/
-
-
-
- Speech Commander - Verbex Voice Systems
-
- * Platform: Various: external hardware with serial port connection
- * Description: A hand-held (portable) device about the size of a
- paperback book which provides speaker-dependent continuous speech
- recognition. The active vocabulary is dependent on the model
- chosen and can vary from 300 to 10,000 active words. The device
- connects through a serial port, so it can be connected to a wide
- range of computers. It comes with a battery pack.
- * Contact: Verbex Voice Systems
- 1090 King Georges Post Rd., Bldg 107,
- Edison NJ 08837, USA
- Ph: (908) 225-5225, Fax: (908) 225-7764
- Email: sales@listen.verbex.com
- WWW: http://www.verbex.com/
-
-
-
- 'Speech Recognition Expert' Toolkit for Windows
-
- * Description: Provides an object-oriented development tool designed
- to rapidly build speech enabled applications without writting
- source code. Currently supports IBM's VoiceType Application
- Factory. Future versions to support other platforms. Includes
- BlackBox library and Custom Grammar Tools.
- * Requirements: Layout for Windows from Objects, Inc.
- * Price: $US349 + Shipping/Handling
- * Contact: Speech Technologies, Inc.
- P.O. Box 3905
- Naperville, IL 60567-3905
- CompuServe @102147,3521
- Ph: (708)983-7634
-
-
-
- Visual Voice from Stylus Innovation
-
- * Platform: Microsoft Windows
- * Description: Visual Voice is a toolkit for building Windows-based
- voice processing and telephony applications including interactive
- voice response (e.g. touch-tone banking), fax-on-demand, and voice
- mail. Visual Voice can be used to add voice recognition to your
- telephony applications.
- Voice Recognition (VR) Support for Visual Voice is exposed as a
- standard VBX control and provides one or more voice recognition
- "resources" to your application. Applications can dynamically
- assign resources across several voice lines. Voice recognition is
- either "discrete" or "continuous". Discrete recognition is
- slightly more accurate and requires the speaker to pause briefly
- between words. Continuous recognition provides a natural way to
- enter information by speaking without pauses. Three configurations
- are supported:
-
- Software-Only Solution
- The software only solution uses Telaccount's SpeechEasy
- technology for discrete recognition using your PC's CPU.
- A vocabulary is included with digits, basic command words
- and more.
-
- Hardware-Assisted Solution with Dialogic AEB boards
- Discrete voice recognition in over 25 languages using
- Dialogic D/41D voice boards and the Dialogic VR/40 board.
- Vocabularies are included with digits, basic command
- words, voice mail vocabulary and more.
-
- Hardware-Assisted Solution with Dialogic PEB boards.
- Use the VR control with any Dialogic PEB-based voice
- board, such as the D/12x or D/24x, to access voice
- recognition resources from your phone lines. This
- requires a Dialogic VRP board with either 1 to 4 VRM/40
- modules (4 channel discrete voice recognition modules)
- and/or 1 to 4 VRM/2C modules (2 channel continuous voice
- recognition modules). You can have up to 4 modules on
- each VRP: 4 VRM/40s for 16 channels of discrete voice
- recognition; 4 VRM/2Cs for 8 channels of continuous
- recognition; or a combination. Over 25 languages
- supported. Includes vocabularies as described above.
-
- * Pricing: Unknown
- * Availability: From Stylus Innovations Inc. or from the
- distributors listed on the Stylus WWW pages.
- * Misc: More detailed technical information, slide show
- demonstration software is available on the Stylus home page.
- * Contact: Stylus Innovation Inc.
- One Kendall Square, Building 300, Cambridge, MA 02139
- Ph: (617) 621 9545, Fax: (617) 621 7862
- WWW: http://www.stylus.com/
- Compuserve forum: GO STYLUS
- Email: info@stylus.com
-
-
-
- Voice Command Line Interface
-
- * Platform: Amiga
- * Description: VCLI will execute CLI commands, ARexx commands, or
- ARexx scripts by voice command through your audio digitizer. VCLI
- allows you to launch multiple applications or control any program
- with an ARexx capability entirely by spoken voice command. VCLI is
- fully multitasking and will run in the background, continuously
- listening for your voice commands even while other programs are
- running. Documentation is provided in AmigaGuide format. VCLI 6.0
- runs under either Amiga DOS 2.0 or 3.0.
- * Requirements: Supports the DSS8, PerfectSound 3, Sound Master,
- Sound Magic, and Generic audio digitizers.
- * Availability: by ftp from wuarchive.wustl.edu in the file
- systems/amiga/incoming/audio/VCLI60.lha and from
- amiga.physik.unizh.ch as the file pub/aminet/util/misc/VCLI60.lha
- * Contact: Author's email is RHorne@cup.portal.com
-
-
-
- Voice Control Systems Continuous Speech Recognition
-
- * Description: Voice Control Systems (VCS) continuous speech
- recognition is a proprietary phonetic recognizer based on
- technology developed at VCS over the last 17 years. It is robust
- for applications such as the "hands-free" automotive environment
- or telephone networks, both wireless and wireline. VCS speech
- recognition is used by many developers and manufacturers in
- telecommunications. VCS technology is a software-based capability
- which VCS has currently developed for a limited number of
- processing environments. VCS offers "off-the-shelf" capabilities
- for the TI-C3X and C4X DSPs with other hardware platform support
- planned for the future. As a benchmark, today's VCS continuous
- technology requires about 1/2 of a 33Mhz TMS320C31. VCS continuous
- technology is available in cellular and wireline based libraries
- for continuous digit input in approximately 15 languages. VCS
- continuous recognition is a modified HMM decision strategy built
- upon the foundation of VCS phonetic "front end".
- * Availability: VCS continuous technology is available today in
- software form from VCS or implemented in hardware or speech
- systems from VCS distributors including Dialogic Corporation,
- Brite Voice, Intervoice, Periphonics, and Syntellect.
- * Cost: Software royalties are volume based and range from per unit
- costs of $500 per recognizer to less than $5 in large quantities.
- * See also: the VCS Phonetic Dictionary Recognizer and VCS Isolated
- Word Speech Recognition below, and the VCS 2030 & 2060 Voice
- Dialers.
- * Contact: Voice Control Systems, Inc.
- 14140 Midway Rd., Dallas, Tx. 75244, USA
- Ph: +1-214-386-0300, Fax: +1-214-386-5555
- Email: sales@vcsi.com
- WWW: http://www.voicecontrol.com/
-
- Voice Control Systems Phonetic Dictionary Recognizer
-
- * Description: This recognizer is based upon a HMM type recognition
- strategy coupled with the VCS "front end" (feature extraction
- software). The HMM modeling is based upon the basic phonetic
- building blocks in each language. In American English this is
- approximately 43 units. The recognition vocabulary is built up by
- combining these units into word models. By building the words in
- this way new recognition vocabularies may be constructed. The
- phonetic assembly can also be used for "word spotting" recognition
- libraries.
- * Platform: This VCS recognition software runs on the TI TMS320C30
- DSP. Two recognizers can operate on a single 55mhz C30. Currently
- the software may be purchased as an Enhanced Technology from VCS
- to run on the Dialogic VR/160p speech recognizer board. The
- hardware is purchased from Dialogic, with the "Enhanced" software
- purchased from VCS. Up to four phonetic recognizers can run on a
- single 160; one per VRM2C (C30-33mhz DSP) daughtercard.
- * Note: This recognizer is in its late "beta" stage of development
- and is available for U.S. English vocabularies. Other languages
- are presently under development.
- * Price: VCS software is priced at $350 per recognizer for unit
- quantities with volume discounts available.
- * See also: VCS Continuous Recognition above, VCS Isolated Word
- Speech Recognition below, and the VCS 2030 & 2060 Voice Dialers.
- * Contact: Voice Control Systems, Inc.
- 14140 Midway Rd., Dallas, Tx. 75244, USA
- Ph: +1-214-386-0300, Fax: +1-214-386-5555
- Email: sales@vcsi.com
- WWW: http://www.voicecontrol.com/
-
- Voice Control Systems Isolated Word Speech Recognition
-
- * Description: Voice Control Systems (VCS) isolated word recognition
- using VCS phonetic recognizer technology. It is robust in
- demanding environments such as the "hands-free" automotive
- environment, telephone networks, wireless or wireline.
- Capabilities include speaker-independent, speaker-dependent and
- speaker-adaptive recognition. Libraries are available for 45+
- languages and custom vocabulary development services are
- available. The technology is suited for many applications
- including:
- + Desktop computing: such as keyboard accelerators
- orinteractive multimedia.
- + Network telephony: such as automating operator functions or
- voice dialing.
- + Computer telephony: such as remote access to a personal
- computers.
- + Automotive accessory control: such as voice activated
- cellular phones or other automotive accessories.
- + Consumer electronics: such as voice controllers for video
- games or VCRs and televisions.
- * Platform: Include Intel-X86, TI-C5X, C3X, C4X and C2X, OKI 6679,
- and NEC-V20 and V30, and can operate on 16 bit microcontrollers.
- As a benchmark, 8 recognizers can run on an Intel 486-33 DX.
- * Availability: The technology is available under software licenses
- direct from VCS or by purchasing hardware from an OEM. VCS OEMs
- include: Dialogic, Oki Semiconductor, Intervoice, Periphonics,
- etc.
- * Cost: VCS isolated word recognition software is available under a
- volume pricing license agreement. Small quantity royalties are in
- the $500.00 per recognizer range while large (millions) quantity
- royalties are less than $1.00 per recognizer.
- * See also: VCS Continuous Speech Recognition and VCS Phonetic
- Dictionary Recognizer above, and the VCS 2030 & 2060 Voice
- Dialers.
- * Contact: Voice Control Systems, Inc.
- 14140 Midway Rd., Dallas, Tx. 75244, USA
- Ph: +1-214-386-0300, Fax: +1-214-386-5555
- Email: sales@vcsi.com
- WWW: http://www.voicecontrol.com/
-
-
-
- Visus SpeechKit
-
- * Platform: NeXT
- * Description: SpeechKit is based on SPHINX, a speaker-independent,
- 1000 word or so, continuous speech recognition system which allows
- you to incorporate speech recognition into your applications. You
- can design your vocabulary and grammars.
- * Contact: Visus - no address or phone provided. A possible contact
- is Robert Brennan at Carnegie Mellon University. email:
- Robert_Brennan@cmu.edu
-
-
-
- VCS 2060 Voice Dialer
-
- VCS 2030 Voice Dialer
-
- * Platform: Stand-alone hardware, TMS320C5X based with VCS phonetic
- speech recognition and CELP speech compression.
- * Description: The VCS 2060 is a telephone dialing system which
- recognizes 50 names - and speed dials the associated telephone
- number. The VCS 2030 has 20 memories. Users use
- speaker-independent recognition to select the "call", "program",
- or "list" menu, then place a call, enroll a new memory, or listen
- to playback of entries in the phonebook. Enrollment is simple and
- includes a "name tag" enrollment pass so that when one selects an
- entry to call, the selection is confirmed by repeating the
- memory's associated name tag, e.g. "calling Pete". The system uses
- both speaker-independent and speaker-dependent technology from
- Voice Control Systems, Inc.
- * Installation: The VCS 2060 can be installed in series (RJ-11) with
- one phone for single phone operation or installed in parallel
- (RJ-31) to provide voice dialing from every phone in a house.
- * Cost: Standard retail prices:
- + VCS 2030 Voice Dialer - $269.00
- + VCS 2060 Voice Dialer - $299.00
- * Availability: From catalogs or direct from Voice Control Systems.
- Voice Control Systems
- 14140 Midway Rd., Dallas, Tx. 75225, USA
- Ph: 800-VCS-7525, Fax: +1-214-386-5555
- Email: sales@vcsi.com
- WWW: http://www.voicecontrol.com/
-
-
-
- Voice-Trek 2.0
-
- * Platform: Unknown.
- * Description: VoiceTrek is primarily used by the United States
- Postal Service to sort mail. Tardis Technology Inc. was created to
- develop and market applications that utilize speech recognition.
- They do consulting work as well as turnkey systems.
- * Contact: Tardis Technology Inc., Voice Recognition Div.
- 6444 E. Spring St., #286, Long Beach, CA 90815-1500, USA
- Phone: +1-310-497-0077, Fax: +1-310-497-0080
-
-
-
- VoiceAssist for Windows from Creative Labs, Inc.
-
- * Platform: Windows
- * Description: Seeking a description.
- * Availability: VoiceAssist preview software is available from the
- Creative Labs VoiceAssist home page.
- * Contact: Creative Labs, Inc.
- Ph: 1-800-998-1000 (Sales)
- Ph: 1-800-998-5227 (Product info and dealer referrals)
- CompuServe: support forum: GO BLASTER
- WWW: http://www.creaf.com/
-
-
-
- VoiceServer for Windows
-
- * Platform: Windows
- * Description: Speaker dependent, each with an independent
- directory. Isolated words. Up to 1000 words/user, 300
- words/window. 1 word occupies 2Kb on hard disk. Can be used to
- control Windows applications by issuing voice commands instead of
- menu selection.
- * Rough Cost: 292 Pounds(UK)
- * Requirements: None
- * Misc: Price includes a half-sized AT voice card (including a DSP),
- software, documentation & a microphone (attachable to keyboard or
- speaker). A light-weight high-spec headset is an optional extra.
- * Contact:
-
- Mark Redwood
- Applied Voice Technologies
- 26 Danbury Street, Islington,
- London, UK, N1 8JU
- Ph: + 44 71 454 1224 : Fax: + 44 71 454 1225
-
-
-
- Voicetek Corp.
-
- * Platform: Unknown.
- * Description:Voicetek Corporation provides voice processing
- solutions, training and consulting services and an
- object-oriented, graphical Generations Platform for development of
- integrated computer telephony systems.
- * Contact: Voicetek Corporation
- 19 Alpha Road, Chelmsford, MA 01824, USA
- Ph: +1-508-250-9393, Fax: +1-508-250-9378
- WWW: http://www.voicetek.com/
-
-
-
- Votan VPC2100 Voice Card and VSP 1010 Speech Processor
-
- * Platform: DOS
- * VPC2100 Voice Card: a hardware and software system based on the
- TMS320C10. providing continuous speech recognition. The VPC2100
- consists of a circuit board, microphone, speaker, software, and
- documentation. It is designed to add voice I/O and telephone
- management capabilities to the PC/AT and compatibles. Features:
- + Voice store-and-forward at 4- to 16.4-Kb/s speed
- + Speaker-independent speech recognition (0-9, YES, NO)
- + Continuous speaker-dependent speech recognition
- + Telephone interface, pulse or tone dialing, call progress,
- and DTMF
- + Software for development, voice mail, telephone management,
- and VoiceKey
- + High-level applications-generator software
- * Votan VSP 1010 speech-processor board: can service a single voice
- channel, providing recognition, voice output, and telephone
- interfacing. Digital signal processing is performed by a TMS320
- integrated circuit.
- * Costs: Unknown
- * WWW: http://www.ti.com/sc/docs/dsps/develop/3rdparty/vot.htm
- * Contact: Votan Division, MOSCOM Corporation
- 6920 Koll Center Parkway, Suite 214, Pleasanton, CA 94566, USA
- Ph: +1-510-426-5600, Fax: +1-510-426-6767
-
-
-
- Voice Processing Corporation Speech Recognition Product Line
-
- * Platform: Unknown.
- * Description: Voice Processing Corporation (VPC) supplies automated
- speech recognition systems. VPC's products are used in the
- telecommunications, cellular and personal computer markets to
- enable computers to understand human speech. The company's VPro
- product line is sold to original equipment manufacturers (OEMs),
- value added resellers (VARs), system integrators and application
- developers. VPC's speech recognition systems are currently used in
- applications such as voice mail, voice activated dialing,
- interactive voice response, and command and control of personal
- computers.
- The following are descriptions of the Voice Processing
- Corporation's VPro Product Line: VProContinuous, VPro/XD, VPro/RT,
- VProCel, VProSpeller, VProPRL, VPro hardware platforms, and the
- application Osprey.
- More information is available on these products at the VPC WWW
- site: http://www.vpro.com/
- * VProContinuous(TM) is a speaker-independent, continuous digit
- recognizer. It recognizes digit strings spoken in a continuous
- manner, by any caller, without unnatural beeps or pauses.
- VProContinuous uses out-of-vocabulary rejection and word spotting
- technologies to reject extraneous words and phrases often spoken
- by callers. The VProContinuous vocabulary consists of the words
- "zero" through "nine," "yes," "no," and "oh." The product is
- language-independent. American English, Australian English,
- Brazilian Portuguese, Canadian French, Castilian Spanish, French,
- German, Italian, Mexican Spanish, Portuguese, Swiss German and
- U.K. English versions are available.
- * VPro/XD(TM) is a discrete or multiword speech recognizer for
- extra-demanding applications and/or vocabularies. This robust
- discrete product recognizes isolated discrete utterances (words or
- very short phrases). VPro/XD utilizes proprietary
- out-of-vocabulary rejection and word-spotting technologies.
- VPro/XD is speaker-independent and includes Talkover capability
- allowing speech-interrupt over prompts. Pre-trained vocabulary
- libraries are available in American English, Australian English,
- Brazilian Portuguese, Canadian French, Castilian Spanish, Central
- American Spanish, German, Italian, Mandarin Chinese, Mexican
- Spanish, Portuguese, Swiss German and UK English. Pre-trained
- vocabularies consisting of voice mail words, voice dialing words,
- call control words, banking, and emergency words are available in
- American English (both cellular and land-line).
- * VPro/RT(TM) is a discrete speech recognizer for rapid training of
- vocabularies in the field. This robust discrete product recognizes
- isolated discrete utterances. Application designers and end-users
- define the vocabulary of their choice and train the system in
- real-time either prior to system start-up, or adapting on-the-fly
- while the system is running live. Vocabularies can be subset, and
- applications involving thousands of words can be developed
- quickly. VPro/RT, which also supports Talkover, is suited to
- speaker-dependent recognition tasks, such as the personal
- directory of names in a voice-activated dailing application.
- VPro/RT is also good for applications that require
- speaker-independent vocabularies to be developed quickly in the
- field or those that require many vocabularies. VPro/RT can also be
- used as a tool for quick prototyping of applications.
- * VProCel consists of speaker-independent VProContinuous, VPro/XD
- and speaker-dependent VPro/RT specifically tuned for the cellular
- environment. The speaker-dependent discrete feature of VProCel
- allows for a user-defined 20-word personal directory, with a
- one-pass enrollment whereby users need only speak their chosen
- commands once. In addition, cellular-ready VPro/XD vocabularies
- consisting of voice-activated dialing command words are also
- available. VProCel is suited to voice-activated dialing
- applications using either digit strings or a listing of words in a
- personal directory.
- * VProSpeller is a recognizer that can determine which name or word
- is being spelled by a caller. Users may spell a string of letters
- (up to 32 letters) in an uninterrupted manner (without prompts or
- beeps between each letter). VProSpeller can recognize confusable
- letters by conducting an automated search of a database of words
- maintained by the application for the best candidates to match.
- * VProPRL Designed for customers who wish to enable VPC speech
- recognition technologies on platforms other than those supported
- by VPro hardware, the VProPRL is a portable recognizer library of
- VProContinuous, VPro/XD and VPro/RT, which can be embedded into a
- wide variety of hardware platforms. It consists of a library of
- object modules which can be linked with a user application or
- task.
- * VPro Hardware Platforms: VPro-42, VPro-84, VPro-88 : The VPro
- platforms are ISA compliant PC/AT boards. Each supports four to
- eight Virtual Speech Processors (VSPs). Each VSP, depending on
- load factors, can handle multiple telephone lines. Application and
- host computers communicate with each of the VSPs as separate
- autonomous units. VPro platforms use Texas Instruments TMS320C31
- microprocessors which provide up to 133 MFLOPS of compute power.
- The platforms can have up to 8 megabytes of memory shared among
- all processors. In addition, each processor has 512K bytes of
- local memory. Both the PEB and MVIP PCM audio buses are supported
- by all VPro platforms.
- * Osprey is a call management software application that performs the
- kinds of telephone related activities typically done by a personal
- assistant, such as answering the phone, screening callers, routing
- calls, and taking and delivering messages. It is an automated
- phone attendant.
- * Price and availability: Contact Voice Processing Corporation
- * Contact: Kelli V. Smith
-
- Voice Processing Corporation
- 1 Main Street, Cambridge, MA, 02142 USA
- Ph: (617)494-0100 Fax: (617)494-4970
- e-mail: KSmith@vpro.com
- WWW: http://www.vpro.com/
-
-
-
- Whisper
-
- See the new page for Microsoft speech recognition software.
- * Platform: Windows 95 and Windows NT 4.0
- * Description: Command and control recognition.
-
-
-
- WildCard Speech Products
-
- * Platform: Windows 3.1 and Windows 95
- * OfficeTalk for Windows: provides voice commands for dictation,
- navigation, command and control, and formatting for business uses
- of computers. Provides user voice access to a wide variety of
- software applications in office suites from Microsoft,
- Novell/WordPerfect, and Lotus. More information on the WildCard
- OfficeTalk page.
- * LawTalk for Windows: adds features and interfaces that meet the
- specific needs of legal users. More information on the WildCard
- LawTalk page.
- * VoiceCompanion for the Internet: Surf the net using voice
- commands. Controls browsers like Netscape and Microsoft Explorer.
- More information on the VoiceCompanion web page.
- * VoiceCompanion - RemoteAccess: Over the telephone remote access to
- your desktop PC, for voicemail, FAX forwarding and address book
- information. More information on the VoiceCompanion web page.
- * Availability: WildCard Technologies Inc.
- 180 West Beaver Creek Road, Richmond Hill, Ontario, Canada L4B 1B4
-
- Phone: (905) 731-6444, Fax: (905) 731-7017
- Email: sales@wildcardtech.com
- WWW: http://www.wildcardtech.com/
-
-
- ___________________________________________________________________________
-
- Q6.6: Speaker Recognition (Verification and Identification)
-
- * Introduction
- * In the FAQ
- * On the WWW
-
- Introduction
-
- Speaker recognition is the process of automatically recognizing who is
- speaking on the basis of individual information included in speech
- signals. It can be divided into Speaker Identification and Speaker
- Verification. Speaker identification determines which registered
- speaker provides a given utterance from amongst a set of known
- speakers. Speaker verification accepts or rejects the identity claim
- of a speaker - is the speaker the person they say they are?
-
- Speaker recognition technology makes it possible to a the speaker's
- voice to control access to restricted services, for example, phone
- access to banking, database services, shopping or voice mail, and
- access to secure equipment.
-
- Both technologies require users to "enroll" in the system, that is, to
- give examples of their speech to a system so that it can characterise
- (or learn) their voice patterns.
-
- In the FAQ:
-
- * ImagineNation: Voice Activated UnLock Technology
- * Jialong He's Speaker Recognition (Identification) Tool
- * Keyware Biometric Security Products
- * SpeakerKey Voice Verifier from ITT
- * SpeakEZ Voice Print Speaker Verification
- * Voice Control Systems: Speaker Verification Technology
-
- On the WWW
-
- Survey of the State of the Art in Human Language Technology
- Report edited by Ronald A. Cole et. al. with a section on
- Speaker Recognition.
- http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node47.html
-
- Speaker Identification And Verification: LIMSI Report
- A technical description.
- http://www.limsi.fr/Recherche/TLP/reco/2pg95-sv/2pg95-sv.html
-
- Long Index of References on Automatic Speaker Verification
- A list of more than 350 papers on speaker verification in text
- or BibTeX format. Provided by G.Matas.
- http://sig.enst.fr/~chollet/ForMehdi/SpRecV1.l_ind.html
-
- CAVE: Caller Verification in Banking and Telecommunications
- European consortium developing speaker recognition
- technologies.
- http://www.ptt-telecom.nl/cave/
-
- Hangai Lab demonstrations of speaker verification and speaker
- identification.
- Do it yourself demonstrations:
- http://miya8f05.ee.kagu.sut.ac.jp/study/speech/speech1.html
- http://miya8f05.ee.kagu.sut.ac.jp/study/speech/speech2.html
-
-
-
- Voice Activated UnLock Technology (VAULT): ImagineNation
-
- * Description: Password-based voice verification technology using a
- card to store voice-print data. Introductory information and the
- VAULT FAQ are provided on the ImagineNation WWW pages.
- * Contact: Imagine
- PO Box 212, Swansea, MA 02777, USA
- Ph: +1-508-678-9563
- Fax: 508-678-1470
- Email: feedback@ImagineNation.com
- WWW: http://www.ImagineNation.com/
-
-
-
- Jialong He's Speaker Recognition (Identification) Tool
-
- * Platform: SUN SPARC (SunOS), PC (MSDOS)
- * Description: This package contains a set of speaker recognition
- research utilities, including Gaussian mixture models, VQ codebook
- designing program and MLP network. They can also be used as
- general classifiers. The utilities are divided into the following
- categories:
- + Feature extraction and dimensional reduction
- cepstrum -- extract features from speech sigals (LPCC, MFCC,
- etc.).
- search -- select effective features (SFS, SBS method).
- randline -- randomize the a sequence, auxiliary utility.
- bin2asc -- binary to ASCII, auxiliary utility.
- + MLP network
- mlptrain -- MLP network training program.
- mlptest -- MLP network test program.
- + VQ codebook training and test programs
- lbglvq -- VQ codebook training program.
- nearest -- VQ codebook test program.
- + Gaussian mixture model (GMM)
- gmmtrain -- GMM training program.
- gmmtest -- GMM test program.
- Note: this is a research tool not a true speaker recognition
- system.
- * Availability: By anonymous ftp:
-
- MSDOS Version
- UK:
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
- pkrtool.zip
- Germany:
- ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/spkrtool.z
- ip
-
- Sun SPARC version, compiled with GNU C
- UK:
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/s
- pkr_sun_v1.tar.gz
- Germany:
- ftp://ftp.informatik.uni-ulm.de/pub/NI/jialong/speaker_su
- n_v1.tar.gz
-
- * See also: Jialong He's Speech Recognition Research Tool
- * Contact: Jialong He
- email: jialong@neuro.informatik.uni-ulm.de
-
-
-
- Keyware Biometric Security Products
-
- * Description: VoiceGuardian and S2 Security Server provide
- authentication and access control technologies. An online demo of
- Voice Guardian is available.
- * Contact: Keyware Technologies
- _USA_
- Keyware Technologies
- 500 West Cummings Park, Suite 3600, Woburn, MA 01801, USA
- Ph: (617) 933 1311, Fax: (617) 933 1554
- _Belgium_
- Keyware Technologies
- Excelsiorlaan 28-30, 1930 Zaventem, Belgium
- Ph: 32 2 721 4574, Fax: 32 2 721 5015
- _Email:_ sales@keywareusa.com
- _WWW:_ http://www.keywareusa.com/
-
-
-
- SpeakerKey Voice Verifier from ITT
-
- * Platform: Windows/Pentium and Solaris/SPARC
- * Description: SpeakerKey provides over-the-phone voice
- verification. It is configurable for use in a wide range of
- applications.
- SpeakerKey provides a Speaker Verification API (SVAPI).
- SpeakerKey uses two technologies: (1) speaker-independent digit
- recognition using hidden Markov models, (2) speaker verification
- using "Nearest Neighbour Matching with Likelihood Ratio Scoring
- and cohort speakers."
- Dr. Joe Campbell maintains a SpeakerKey FAQ on the WWW. It
- provides a more detailed description of SpeakerKey and discusses
- several speaker verification issues:
- http://www.vitro.bloomington.in.us:8080/~BC/REPORTS/SpeakerKeyFAQ.
- html
- * Requirements: Minimum 60 MHz Pentium (with sound card) or
- SPARCstation 5, plus phone line interface devices.
- * Price: Evaluation kits available from $75. Developer's kits are
- $1500. Run-time licenses are priced from $600 to $10,000 depending
- upon the number of user and/or verifications per hour. Application
- customization is available.
- * Contact: ITT Industries
- Fort Wayne, IN, USA
- Ph: +1-219-487-6321, Fax: +1-219-487-6126
- Email: speakerkey@itt.com
-
-
-
- SpeakEZ Voice Print Speaker Verification
-
- * Description: Designed to prevent cell phone theft and cloning
- fraud by comparing the cellular caller's statement of a
- pass-phrase to a stored digital "voice print" of the authorized
- subscriber. If the caller's voice patterns do not match the stored
- voice print, service will be denied or the caller will be referred
- to operator assistance for further validation processing. Features
- include:
- + Customer selected password.
- + Vocabulary and language independent.
- + No special hardware required by customer.
- + Multiple delivery options.
- * Contact: T-NETIX, Inc.
- 6675 South Kenton Street Englewood, CO 80111 USA
- Phone: (800) 352-8628, (303) 790-9111, Fax: (303) 790-9540
- WWW: http://www.t-netix.com/
-
-
-
- Voice Control Systems: Speaker Verification Technology
-
- * Description: SpeechPrint ID technology provides language
- independent speaker verification. Features:
- + Multiple speech input formats
- + Operates over various microphones or the telephone network
- + Can can be used in conjunction with discrete and continuous
- recognition
- + Robust against background noise and spurious telephone
- channel noise
- For more information on features, hardware and software
- requirements, pricing and availability, contact Voice Control
- Systems, Inc. or visit their the VCS WWW site or the SpeechPrint
- ID WWW page.
- * See also: VCS speech recognition products in Q6.5.
- * Contact: Voice Control Systems, Inc.
- 14140 Midway Rd., Dallas, Tx. 75244, USA
- Ph: +1-214-386-0300, Fax: +1-214-386-5555
- Email: sales@vcsi.com
- WWW: http://www.voicecontrol.com/
-
-
- ___________________________________________________________________________
-
- Q6.7: Integrated Speech Products
-
- This section lists those products which integrate different speech
- technologies into a single user package. For example, speech
- recognition and speech synthesis can be combined to provide a dialog
- management system. Strictly speaking, this doesn't really belong under
- in Section 6 (Speech Recognition) but since these products all include
- speech recognition, it seems a reasonable place to put it for now!
-
- In the FAQ...
-
- * SpeechWorksfrom Applied Language Technologies, Inc.
- * Nortel Speech Technology Products
-
-
-
- SpeechWorksfrom Applied Language Technologies, Inc.
-
- * Description: SpeechWorks and companion products provide advanced
- speech recognition technology for the telephony market.
- SpeechWorks can be used by developers to "speech-enable" call
- center, messaging, enhanced services, and other types of
- applications. The three major system modules - SpeechWorks,
- DialogModules and SpeechBuilder - are described below. More
- detailed information is available from the Applied Language
- Technologies home page.
- ALTech develops and markets speech understanding software which
- provides large vocabulary, speaker-independent, phonetic speech
- recognition. ALTech's software contains a comprehensive set of
- features for speech-enabling telephone-based transactions and
- services. SpeechWorks is based on technology licensed from the
- Spoken Language Systems Group at the Massachusetts Institute of
- Technology.
- * SpeechWorks: provides the core speech recognition capabilities.
- Features include:
- + Phonetic segment-based, speaker-independent, large
- vocabulary, continuous speech recognition
- + Real-time vocabulary generation directly from text
- + Database integration
- + "Barge-in" capability
- + Adaptive channel normalization
- + "n-best" output and associated confidence scores
- + Support for multiple languages
- + Software-only or DSP-based implementations
- + Support for multiple platforms and operating systems (e.g.,
- SCO UNIX, WindowsNT, etc.)
- * DialogModules: manage the "conversation" between the system and
- the caller within an application. They provide high-level
- application building blocks which enable developers to quickly and
- easily add speech interfaces to computer telephony applications.
- Each DialogModule accomplishes a particular task within an
- application, ranging from "simple" tasks such as capturing a
- yes/no response or a phone number, to more complex tasks such as
- capturing credit card information or name and address information.
-
- DialogModules provide "out-of-the-box" functionality. They contain
- pre-built grammars, user-interface design, internal call flow and
- error recovery routines, parameters for customization and a set of
- C++ class libraries and C APIs.
- * SpeechBuilder: provides tools for customizing the DialogModules
- and for developing and maintaining applications. A GUI-based
- Vocabulary Editor provides the ability to generate and maintain
- vocabulary or word lists. Pronunciations can be generated
- automatically using the built-in dictionary or can be
- automatically generated using a set of text-to-phoneme rules.
- * Product Bundles: are available which combine SpeechWorks and
- multiple DialogModules into application templates for a set of
- generic application categories.
- + SpeechForms SpeechForms provides an interactive method for
- entering data over the phone, such as ordering products,
- filling out surveys and completing registration forms.
- Typical applications include: order entry, reservations,
- catalog and literature requests, catalog shopping,
- subscriptions, change of service, claims, credit card
- activation, home banking, stock transactions, and warranty
- reservations.
- + SpeechQuery SpeechQuery is used to deliver information in
- response to voice requests over the phone, such as airline
- information, product delivery status and retirement benefit
- information. Typical applications include: order status,
- product information, account balance, flight status, movie
- listings, job listings, stock quotes, guide
- services,classified ads, claims status, dealer locator
- services, and technical support.
- + SpeechAgent SpeechAgent provides a set of modules for
- automating telephone-based voice messaging applications, such
- as integrated messaging, single-number services and
- voice-dialing. Typical applications include: voice messaging,
- voice dialing, auto attendant, address book access, email
- access, and scheduling.
- * Platform: Platforms and Operating systems: ALTech's software can
- be deployed on industry-standard hardware platforms and operating
- systems including: Sun SPARC-based systems running SunOS or
- Solaris, IBM RS/6000s running AIX, HP systems running HP-UX, and
- 486/Pentium-based PCs and servers running Windows, WindowsNT, SCO
- UNIX, or Solaris. ALTech's systems are designed to run all or some
- of the software on a digital signal processor.
- * Availability: contact ALTech for licensing information.
- * Contact: Applied Language Technologies, Inc.
- 215 First Street, Cambridge, MA 02142
- Ph: 617-225-0012, Fax: 617-225-0322
- Email: to Alisa Moyer: moyer@altech.com
- WWW: http://www.altech.com/
-
-
-
- Nortel Speech Technology Products
-
- * Nortel's AudioGram Delivery Service (ADS):
- When a busy or no answer condition is encountered, an intercept
- message offers ADS, which provides a service to the calling party
- by taking a message automatically. ADS records the caller's
- message and attempts delivery repeatedly if needed until the
- message is delivered. ADS is comprised of four independent
- services: 0+, 1+ and Local, Intentional, and Millenium AudioGram.
- ADS services utilize Nortel's Flexible Voice Recognition (FVR)
- voice-processing capabilities. ADS features include:
- + Cost-saving common service platform (NAV)
- + Builds upon existing network investment in toll
- infrastructure capabilities of AABS (Automated Alternate
- Billing Service)
- + Leverages the capabilities of existing TOPS (Traffic Operator
- Position System) attendants.
- More information: is available on the Nortel Multimedia Network
- Applications WWW page for AudioGram Delivery Service.
- * Nortel's Voice-Activated Auto Attendant (VAAA):
- Replaces touch tone menu with easy-to-use voice interface. Geared
- to businesses and corporations to provide more effective
- management of incoming customer calls. Residing on the Network
- Applications Vehicle (NAV) platform, VAAA uses Flexible Vocabulary
- Recognition (speaker-independent) technology to recognize spoken
- words, and directs calls accordingly. Other features include:
- + Cost-saving common service platform (NAV)
- + Serves DTMF and rotary dial callers.
- + Handles incoming calls for all corporate users (Centrex, PBX,
- or key systems)
- More information: is available on the Nortel Multimedia Network
- Applications WWW page for Voice-Activated Auto Attendant.
- * Nortel's Voice-Activated Dialing (VAD):
- Phoneme-based speech dialing capabilities provided through
- speaker-trained and speaker-independent technologies. Residing on
- the Network Applications Vehicle (NAV) platform, VAD enables
- subscribers to dial using speech, as well as to create and
- customize personal telephone directories. Other features include:
- + Cost-saving common service platform (NAV)
- + Speech playback and Text-to-speech synthesis
- + Dual Language capability (optional)
- + Speech Recording
- + Canadian French speechware (optional, prompts and FVR)
- + Spanish speechware (optional, prompts and FVR)
- + 75-name VAD directory size
- + Word-spotting
- + DTMF tone detection
- + Directory sharing
- + Scalable service deployment
- + Talk-through
- More information: is available on the Nortel Multimedia Network
- Applications WWW page for Voice-Activated Dialing.
- * Nortel's Voice-Activated Premier Dialing (VAPD):
- Enables businesses to take advantage of the public network
- directories to stimulate customer calls. Residing on the Network
- Applications Vehicle (NAV) platform, VAPD uses Flexible Vocabulary
- Recognition (speaker-independent) technology to recognize business
- names, and routes calls to the appropriate business entity. VAPD
- promotes cost savings by utilizing a common service platform, the
- Network Applications Vehicle (NAV). It services DTMF callers as
- well as rotary dialers, and handles incoming calls for all
- corporate users: Centrex, PBX, and key systems. More information:
- is available on the Nortel Multimedia Network Applications WWW
- page for Voice-Activated Premier Dialing.
- * Platform: This speech-based service operates on the Network
- Applications Vehicle (NAV) platform. NAV is a multi-application,
- digital signal processing platform supporting both speech- and
- display-based applications. The NAV platform provides the speech
- recognition capabilities and application logic used by NAV
- features an open, modular hardware architecture and flexible
- software design. Other features include:
- + Scalable hardware - from 24 to over 2000 ports per NAV node;
- 1 to 24 independent application shelves per node
- + Powerful speech processing - speaker-independent and
- speaker-trained speech processing support
- + Reliability - N+1, N+M, and 2N redundancy
- + Central Management - access via graphical user interface to
- remote connections
- * See Also: Nortel Feature Planning Guide, reference number
- 50004.11; NAV Applications and Planning Guide, reference number
- 50118.16.
- Nortel's Multimedia web pages:
- http://www.nortel.com/entprods/multimedia/
- * Contact: NORTEL
- Multimedia Communications Systems Division
- Multimedia Network Applications
- 1000 Park Forty Plaza
- Durham, NC 27713 USA
- Ph: 1-800-4NORTEL
- WWW: http://www.nortel.com/entprods/multimedia/
-
-
- ___________________________________________________________________________
-
- Copyright (c) 1993-6 by Andrew Hunt, all rights reserved.
- This FAQ may be posted to any USENET newsgroup, on-line service, or BBS as
- long as it is posted in its entirety and includes this copyright statement.
- This FAQ may not be distributed for financial gain.
- This FAQ may not be included in any collections or compilations
- without express permission from the author.
-
-
-
- ---
-
- Andrew Hunt
- Speech Applications Group
- Sun Microsystems Laboratories Ph: (978) 442-2681
- 2 Elizabeth Drive, MS UCHL03-207 Fax: (978) 250-5067
- Chelmsford, MA 01824, USA Email: andrew.hunt@east.sun.com
-