home *** CD-ROM | disk | FTP | other *** search
Text File | 1995-07-25 | 48.8 KB | 1,162 lines |
- Subject: comp.speech Frequently Asked Questions - part 1/3
- Newsgroups: comp.speech,comp.answers,news.answers
- From: andrewh@speech.su.oz.au (Andrew Hunt)
- Date: 10 Nov 1994 01:28:52 GMT
-
- Archive-name: comp-speech-faq/part1
- Last-modified: 1994/11/04
-
-
- COMP.SPEECH FAQ POSTING - PART 1/3
-
-
- [Note: this document has been automatically extracted from
- a WWW site. This may introduce some formatting errors.]
-
-
- Comp.Speech Frequently Asked Questions
-
- The Frequently Asked Questions (FAQ) is a regular posting to
- comp.speech which attempts to answer some of the regular questions in
- the comp.speech newsgroup.
-
- The FAQ is not meant to discuss any topic exhaustively. It will
- hopefully provide readers with pointers on where to find useful
- information, especially material available on the Internet.
-
- If you have not already read the Usenet introductory material posted
- to "news.announce.newusers", please do. For help with FTP (file
- transfer protocol) look for a regular posting of "Anonymous FTP List -
- FAQ" in comp.misc, comp.archives.admin or news.answers.
-
- This FAQ is posted every 4 weeks to comp.speech, comp.answers &
- news.answers.
-
- It is also available for anonymous ftp from the comp.speech archive
- site :
- * ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/FAQ-complete
-
- From the news.answers ftp site (and its mirrors)
- * ftp://rtfm.mit.edu/pub/usenet/news.answers/comp-speech-faq/*
-
- Or by sending email to mail-server@rtfm.mit.edu with the following
- line in the body of the message:
- * send usenet/news.answers/comp-speech-faq/*
-
- Admin
-
- This release sees major changes to the format of the FAQ posting.
- These documents are now automatically extracted from the comp.speech
- World Wide Web site - http://www.speech.su.oz.au/comp.speech. I have
- tried to keep the integrity of the posting intact but there are
- probably errors. Please let me know of any you find.
-
- FAQ Sections
-
- The FAQ is divided into the following sections:
- * FAQ Contents
-
- * List of Speech Technology Products and Software
-
- * FAQ Section 1: General Information on Speech Technology
- * FAQ Section 2: Signal Processing
- * FAQ Section 3: Speech Coding and Compression
- * FAQ Section 4: Natural Language Processing
- * FAQ Section 5: Speech Synthesis
- * FAQ Section 6: Speech Recognition
-
- Comp.Speech FTP Site
-
- The comp.speech ftp site (which is described in Q1.2) contains the
- following:
- * Newsgroup Archives
- * Data Resources
- * General Information
- * Software
-
- Acknowledgements
-
- Hundreds of people have made contributions to the comp.speech FAQ over
- the last two years; there are too many to name individually. Special
- thanks go to Tony Robinson and Joe Campbell who have been particularly
- helpful.
-
- Maintainence
-
- The FAQ posting and the Comp.Speech WWW Site are maintained by
-
- Andrew Hunt
- ---
- Speech Technology Research Group
- Dept. of Electrical Engineering
- University of Sydney, NSW, 2006, Australia
- Ph: 61-2-692 4509
- Fax: 61-2-692 3847
- email: andrewh@speech.su.oz.au
-
-
- ===========================================================================
-
-
- COMP.SPEECH FAQ CONTENTS
-
- Introduction
-
- * Overview
- * List of Packages
-
- Section 1 : General Information on Speech Technology
-
- * Q1.1 What is comp.speech?
- * Q1.2 Where are the comp.speech archives?
- * Q1.3 Common abbreviations and jargon.
- * Q1.4 What are related newsgroups and mailing lists?
- * Q1.5 What are related journals and conferences?
- * Q1.6 What resources are available as handicap aids?
- * Q1.7 What speech data is available?
- * Q1.8 Speech File Formats, Conversion and Playing.
- * Q1.9 What "Speech Laboratory Environments" are available?
- * Q1.10 Miscelaneous Software and Other Resources.
-
- Section 2 : Signal Processing for Speech
-
- * Q2.1 What sampling do I need for speech?
- * Q2.2 How do I find the pitch of a speech signal?
- * Q2.3 How do I find the start and end points of a speech signal?
- * Q2.4 Where can I find FFT software?
- * Q2.5 What signal processing techniques are used in speech
- technology?
- * Q2.6 What speech sampling and signal processing hardware can I
- use?
- * Q2.7 How do I convert to/from mu-law format?
-
- Section 3 : Speech Coding and Compression
-
- * Q3.1 Speech compression techniques.
- * Q3.2 What are some good references/books on coding/compression?
- * Q3.3 What software is available? (Includes CELP & G.7xx)
-
- Section 4 : Natural Language Processing
-
- * Q4.1 What are some good references/books on NLP?
- * Q4.2 What NLP software is available?
-
- Section 5 : Speech Synthesis
-
- * Q5.1 What is speech synthesis?
- * Q5.2 How can speech synthesis be performed?
- * Q5.3 What are some good references/books on synthesis?
- * Q5.4 What software/hardware is available?
-
- Section 6 : Speech Recognition
-
- * Q6.1 What is speech recognition?
- * Q6.2 How can I build a very simple speech recogniser?
- * Q6.3 What does speaker dependent/adaptive/independent mean?
- * Q6.4 What does small/medium/large/very-large vocabulary mean?
- * Q6.5 What does continuous speech or isolated-word mean?
- * Q6.6 How is speech recognition done?
- * Q6.7 What are some good references/books on recognition?
- * Q6.8 What speech recognition packages are available?
-
-
- ===========================================================================
-
-
- FAQ: List of Packages
-
- The comp.speech FAQ provides information on a range of software,
- hardware and resources.
-
- Speech Data
-
- * Phonemic Samples
- * Linguistic Data Consortium (LDC)
- * Center for Spoken Language Understanding (CSLU)
- * PhonDat - A Large Database of Spoken German
- * Oxford Acoustic Phonetic Database
-
- Speech Processing Environments
-
- * Entropic Signal Processing System (ESPS) and Waves
- * CSRE: Canadian Speech Research Environment
- * OGI Speech Tools
- * Matlab plus Signal Processing Toolbox
- * Signalyze 3.0 from InfoSignal
- * Kay Elemetrics CSL (Computer Speech Lab) 4300
- * MacSpeech Lab II (MSL II)
- * N!Power
- * Ptolemy
- * Khoros
- * SpeechViewer II
-
- Other Resources
-
- * CMU Dictionary
- * Another Dictionary
- * BEEP dictionary
- * CUVOLAD dictionary
- * MRC database
- * Network Audio System
- * NEVOT (1.4v) from AT&T; BL
- * Human Audio Perception Document
- * Homophone List
- * Auditory Toolbox for Matlab
- * Auditory Modeller 1
- * Auditory Modeller 2
-
- Audio I/O Hardware
-
- * Sun standard audio port (SPARC I & II)
- * Sun standard audio port (SPARC 10 & 20)
- * Ariel Signal Processors
- * IBM RS/6000 ACPA (Audio Capture and Playback Adapter)
- * Sound Galaxy NX , Aztech Systems
- * Sound Galaxy NX PRO, Aztech Systems
- * ATI Stereo F/X Sound Board
- * Various PC Sound Cards
-
- Compression Software and Hardware
-
- * File format conversion
- * shorten - a lossless compressor for speech signals
- * 32 kbps ADPCM
- * GSM 06.10 Compression
- * G.721/722/723 Compression
- * G.728 Compression
- * G.728 LD-CELP vocoder
- * U.S.F.S. 1016 CELP vocoder for DSP56001
- * 8 Kbit/s CELP on the TMS320C5x family of DSP chips
- * CELP 3.2a & LPC
-
- Natural Language Processing
-
- * Natural Language Software Registry (NLSR) - NLP Tools
- * Part of Speech Tagger
-
- Speech Synthesis
-
- * Orator Text-to-Speech Synthesizer
- * Text to phoneme program (1)
- * Text to phoneme program (2)
- * Text to phoneme program (3)
- * Text to speech program
- * "Speak" - a Text to Speech Program
- * TheBigMouth - a Text to Speech Program
- * TextToSpeech Kit
- * SGI Developers Toolbox Synthesiser
- * rsynth
- * SENSYN speech synthesizer
- * spchsyn.exe
- * CSRE: Canadian Speech Research Environment
- * Eloquence (currently an alpha release)
- * JSRU
- * Klatt-style synthesiser
- * DECTalk
- * Speech Manager and PlainTalk
- * Various Mac Speech Output Applications
- * MacinTalk
- * Monologue by Creative Labs
- * Lernout & Hauspie Text-To-Speech SDK
- * Tinytalk
- * Narrator - narrator.device
- * Infovox Product Range
- * SIMTEL-20
-
- Speech Recognition
-
- * HM2007 - Speech Recognition Chip
- * Voice Blaster Ver. 4.0
- * Votan
- * Entropic's HTK (HMM Toolkit)
- * DragonDictate version 3.0
- * DragonDictate for Windows
- * DragonVoiceTools
- * IBM Personal Dictation System
- * Osborne Personal Dictation System (in Australia)
- * VoiceServer for Windows
- * IN3 Voice Command for Windows
- * IN3 Voice Command
- * Phonetic Engine 400 (PE400) - Speech Systems, Inc.
- * SayIt
- * Kurzweil Voice for Windows 1.0
- * D6006 Voice Control Processor
- * Speech Commander - Listen for Windows
- * Voice-Trek 2.0
- * Visus SpeechKit
- * recnet
- * Lotec Speech Recognition Package
- * Myers' Hidden Markov Model software
- * Voice Command Line Interface
- * DATAVOX - French
- * PowerSecretary
- * ICSS system from IBM
- * Creative VoiceAssist
-
-
- ===========================================================================
-
-
- FAQ SECTION 1 - General
-
- Q1.1: WHAT IS COMP.SPEECH?
-
- Comp.speech is a newsgroup for discussion of speech technology and
- speech science. It covers a wide range of issues from application of
- speech technology, to research, to products and lots more. By nature
- speech technology is an inter-disciplinary field and the newsgroup
- reflects this. However, computer application is the basic theme of the
- group.
-
- The following is a list of topics but does not cover all matters
- related to the field (no order of importance is implied).
- * Speech Recognition - discussion of methodologies, training,
- techniques, results and applications. This should cover the
- application of techniques including HMMs, neural-nets and so on to
- the field.
-
- * Speech Synthesis - discussion concerning theoretical and
- practical issues associated with the design of speech synthesis
- systems.
-
- * Speech Coding and Compression - both research and application
- matters.
-
- * Phonetic/Linguistic Issues - coverage of linguistic and phonetic
- issues which are relevant to speech technology applications. Could
- cover parsing, natural language processing, phonology and prosodic
- work.
-
- * Speech System Design - issues relating to the application of
- speech technology to real-world problems. Includes the design of
- user interfaces, the building of real-time systems and so on.
-
- * Other matters - relevant conferences, jobs, books, software,
- hardware, and products.
-
- _________________________________________________________________
-
- Q1.2: WHERE ARE THE COMP.SPEECH ARCHIVES?
-
- comp.speech is being archived for anonymous ftp.
- * ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/archive/
-
- comp.speech/archive contains the articles as they arrive. Batches of
- 100 articles are grouped into a shar file, along with an associated
- file of Subject lines.
-
- Other useful information is also available in comp.speech/info.
- _________________________________________________________________
-
- Q1.3: COMMON ABBREVIATIONS AND JARGON.
- * ANN - Artificial Neural Network.
- * ASR - Automatic Speech Recognition.
- * ASSP - Acoustics Speech and Signal Processing
- * AVIOS - American Voice I/O Society
- * CELP - Code-book Excited Linear Prediction.
- * COLING - Computational Linguistics
- * DTW - Dynamic Time Warping.
- * FAQ - Frequently Asked Questions.
- * HMM - Hidden Markov Model.
- * IEEE - Institute of Electrical and Electronics Engineers
- * JASA - Journal of the Acoustic Society of America
- * LPC - Linear Predictive Coding.
- * LVQ - Learned Vector Quantisation.
- * NLP - Natural Language Processing.
- * NN - Neural Network.
- * TI - Texas Instruments.
- * TIMIT - A large speech corpus from TI and MIT - see Q1.7
- * TTS - Text-To-Speech (i.e. synthesis).
- * VQ - Vector Quantisation.
-
- _________________________________________________________________
-
- Q1.4: WHAT ARE RELATED NEWSGROUPS AND MAILING LISTS?
-
- Newsgroups
-
- comp.ai - Artificial Intelligence newsgroup.
- Postings on general AI issues, language processing and AI
- techniques. Has a good FAQ including NLP, NN and other AI
- information.
-
- comp.ai.nat-lang - Natural Language Processing Group
- Postings regarding Natural Language Processing. Set up to cover
- a broard range of related issues and different viewpoints.
-
- comp.ai.nlang-know-rep - Natural Language Knowledge Representation
- Moderated group covering Natural Language.
-
- comp.ai.neural-nets - discussion of Neural Networks and related
- issues.
- There are often posting on speech related matters - phonetic
- recognition, connectionist grammars and so on.
-
- comp.compression - occasional articles on compression of speech.
- FAQ for comp.compression has some info on audio compression
- standards.
-
- comp.dcom.telecom - Telecommunications newsgroup.
- Has occasional articles on voice products.
-
- comp.dsp - discussion of signal processing - hardware and algorithms
- and more.
- Has a good FAQ posting. Has a regular posting of a
- comprehensive list of Audio File Formats.
-
- comp.multimedia - Multi-Media discussion group.
- Has occasional articles on voice I/O.
-
- sci.lang - Language.
- Discussion about phonetics, phonology, grammar, etymology and
- lots more.
-
- alt.sci.physics.acoustics
- Some discussion of speech production & perception.
-
- alt.binaries.sounds.misc - posting of various sound samples
-
- alt.binaries.sounds.d - discussion about sound samples, recording
- and playback.
-
- Mailing Lists
-
- ECTL - Electronic Communal Temporal Lobe
- Founder & Moderator: David Leip. Moderated mailing list for
- researchers with interests in computer speech interfaces. This
- list serves a broad community including persons from signal
- processing, AI, linguistics and human factors. To subscribe,
- send your name, institute, department, daytime phone and email
- address to:
-
- + ectl-request@snowhite.cis.uoguelph.ca
-
- The ECTL archive site is
-
- + ftp://snowhite.cis.uoguelph.ca/pub/ectl
-
- Prosody Mailing List
- Unmoderated mailing list for discussion of prosody. The aim is
- to facilitate the spread of information relating to the
- research of prosody by creating a network of researchers in the
- field. If you want to participate, send the following one-line
- message to
-
- + listserv@msu.edu
- + subscribe prosody Your Name
-
- foNETiks
- A moderated monthly newsletter distributed by e-mail. It
- carries job advertisements, notices of conferences, and other
- news of general interest to phoneticians, speech scientists and
- others The editors are Linda Shockey and Gerry Docherty. To
- subscribe send the following 1 line message to
-
- + mailbase@mailbase.ac.uk
- + join fonetiks your_first_name your_second_name
-
- Digital Mobile Radio
- Covers lots of areas include some speech topics including
- speech coding and speech compression. Mail Peter Decker
- dec@dfv.rwth-aachen.de to subscribe.
-
- _________________________________________________________________
-
- Q1.5: WHAT ARE RELATED JOURNALS AND CONFERENCES?
-
- Try the following commercially oriented magazine:
- * Voice News - monthly industry newsletter
- Stoneridge Technical Services
- PO Box 1891, Rockville, MD, 20850, USA
- Phone: (301) 424-0114
- * Voice Technology News
- * Voice Processing Magazine (1-800-854-3112)
- * Speech Technology (no longer published)
-
- Try the following technical journals (some contact addresses below):-
- * IEEE Transactions on Speech and Audio Processing (from Jan 93)
- * IEEE Signal Processing Magazine (from Jan 93)
- * IEEE Transactions on Acoustics, Speech, and Signal Processing
- (ASSP) (now obsolete)
- * Computational Linguistics (COLING)
- * Computer Speech and Language
- * Journal of the Acoustical Society of America (JASA)
- * AVIOS Journal
- * ASR News
-
- Try the following conferences:-
- * ICASSP Intl. Conference on Acoustics Speech and Signal Processing
- (IEEE)
- * ICSLP Intl. Conference on Spoken Language Processing
- * EUROSPEECH European Conference on Speech Communication and
- Technology
- * AVIOS American Voice I/O Society Conference
- * SST Australian Speech Science and Technology Conference
-
- Here are a few contact addresses:-
-
- Publications:
- IEEE Transactions on Speech and Audio Processing (from Jan 93)
- IEEE Transactions on Acoustics, Speech, and Signal Processing
- (ASSP) - now obsolete.
-
- Organization:
- Institute of Electrical and Electronics Engineers (IEEE)
-
- Contact:
- IEEE Service Center
- 445 Hoes Lane, PO Box 1331, Piscataway, NJ 08855, USA
- Phone: 1-800-678-IEEE or (201)981-0060
-
- Publications:
- Computer Speech and Language
-
- Contact:
- Academic Press, Ltd.
- 24-28 Oval Rd, London NW1, England
-
- Price:
- $136 (Institutions), $58 (Individuals)
-
- Publications:
- Association for Computational Linguistics
-
- Organization:
- Association for Computational Linguistics
- MIT Press Journals
- 55 Hayward St, Cambridge, MA 02142, USA
- Phone: (617)253-2889
-
- _________________________________________________________________
-
- Q1.6: WHAT RESOURCES ARE AVAILABLE AS HANDICAP AIDS?
-
- Can anyone provide information on speech technology aids for the deaf,
- blind, speech impaired, physically impaired and other groups who may
- benefit from speech technology?
-
- SpeechViewer II
- * Platform: IBM Machines from Mod 25 on.
- * Description: SpeechViewer II is a speech therapy tool. It
- provided graphical feedback of various speech features so that
- speech impaired individuals can improve their speech. It works
- with an audio bandwidth of 7.3 Khz and thus allows the therapist
- to work with sustained vowels and fricatives. A wide range of
- graphics are used to provide adequate variability to hold client
- interest. An extensive set of statistics are gathered which allows
- a therapist to do research or keep therapy records. The speech
- therapy modules are:
- + Awareness - Sound, Loudness, Pitch, Voicing Onset, Voicing
- + Skill Building - Pitch, Voicing, Phonology
- + Patterning - Pitch & Loudness - Waveform & Spectrogram,
- Spectra
- + Clinical Management - Profiles, Models, Client Data
- * Hardware: Requires an IBM M-ACPA (Multimedia-Audio Capture
- Playback Adapter). It has a TI TMS320C25 DSP chip. The input
- sampling rate is 44.1 Khz stereo, 88.2 Khz mono. This is a 16 bit
- card. It has the following jacks: mic in, stereo line in, stereo
- line out, speaker out. Note: This card is being replaced by Mwave
- technology. For more info on Mwave contact Texas Instruments.
- * Price:
- + The software is $2130 list, $1491 educational, part number
- 92F2066.
- + The M-ACPA is $370 list, $222 educational, part number
- 92F3378.
- + The MicroChannel adapter part number is 92F3379 (same price).
- * Contact: The Psychological Corporation (TPC) [IBM Authorized
- Remarketer]
- Phone: 1-800-228-0752 or contact IBM on 1-800-426-4832.
-
- _________________________________________________________________
-
- Q1.7: WHAT SPEECH DATA IS AVAILABLE?
-
- A wide range of speech databases have been collected. These databases
- are primarily for the development of speech synthesis/recognition and
- for linguistic research.
-
- Some databases are free but most appear to be available for a small
- cost. The databases normally require lots of storage space - do not
- expect to be able to ftp all the data you want.
-
- Phonemic Samples
- * First, some basic data. The following ftp sites have samples of
- English phonemes (American accent I believe) in Sun audio format
- files. See Question 1.8 for information on audio file formats.
- + ftp://sounds.sdsu.edu/.1/phonemes: This ftp site appears to
- be obsolete. Does anyone know a new address?
- + ftp://phloem.uoregon.edu/pub/Sun4/lib/phonemes : There
- appears to be some config problem with this ftp server.
- + ftp://sunsite.unc.edu/pub/multimedia/sun-sounds/phonemes
-
- Linguistic Data Consortium (LDC)
- * Briefly stated, the LDC has been established to broaden the
- collection and distribution of speech and natural language data
- bases for the purposes of research and technology development in
- automatic speech recognition, natural language processing and
- other areas where large amounts of linguistic data are needed.
- Here is list of some of the corpora:
- + The TIMIT and NTIMIT speech corpora
- + The Resource Management speech corpus (RM1, RM2)
- + The Air Travel Information System (ATIS0) speech corpus
- + The Association for Computational Linguistics - Data
- Collection Initiative text corpus (ACL-DCI)
- + The TI Connected Digits speech corpus (TIDIGITS)
- + The TI 46-word Isolated Word speech corpus (TI-46)
- + The Road Rally conversational speech corpora (including
- "Stonehenge" and "Waterloo" corpora)
- + The Tipster Information Retrieval Test Collection
- + The Switchboard speech corpus ("Credit Card" excerpts and
- portions of the complete Switchboard collection)
- * Further resources made available in the first year (or two):
- + The Machine-Readable Spoken English speech corpus (MARSEC)
- + The Edinburgh Map Task speech corpus
- + The Message Understanding Conference (MUC) text corpus of FBI
- terrorist reports
- + The Continuous Speech Recognition - Wall Street Journal
- speech corpus (WSJ-CSR)
- + The Penn Treebank parsed/tagged text corpus
- + The Multi-site ATIS speech corpus (ATIS2)
- + The Air Traffic Control (ATC) speech corpus
- + The Hansard English/French parallel text corpus
- + The European Corpus Initiative multi-language text corpus
- (ECI)
- + The Int'l Labor Organization/Int'l Trade Union multi-language
- text corpus (ILO/ITU)
- + Machine-readable dictionaries/lexical data bases (COMLEX,
- CELEX)
- * Detailed information about the Linguistic Data Consortium is
- available by anonymous from the address below. The files in the
- directory include more detailed information on the individual
- databases.
- + ftp://ftp.cis.upenn.edu/pub/ldc
- * For further information contact
- Linguistic Data Consortium
- 441 Williams Hall, University of Pennsylvania
- Philadelphia, PA 19104-6305
- Phone: +1 (215) 898-0464
- Fax: +1 (215) 573-2175
- e-mail: ldc@unagi.cis.upenn.edu
-
- Center for Spoken Language Understanding (CSLU)
- * The ISOLET speech database of spoken letters of the English
- alphabet. The speech is high quality (16 kHz with a noise
- cancelling microphone). 150 speakers x 26 letters of the English
- alphabet twice in random order. The ISOLET data base can be
- purchased for $100 by sending an email request to
- vincew@cse.ogi.edu. (This covers handling, shipping and medium
- costs). The data base comes with a technical report describing the
- data.
- * CSLU has a telephone speech corpus of 1000 English alphabets.
- Callers recite the alphabet with brief pauses between letters.
- This database is available to not-for-profit institutions for
- $100. The data base is described in the proceedings of the
- International Conference on Spoken Language Processing.
- + Contact vincew@cse.ogi.edu if interested.
- * CSLU has released for universities its Continuous English Speech
- Corpus. The corpus contains recorded speech from 690 different
- speakers, with label files at various levels - including word
- level and phonetic labels. The data were collected as part of the
- OGI Multi-language telephone corpus. CSLU provides speech corpora
- to all universities without charge. To order a corpus, print the
- license agreement/order form, complete it, and fax it to the CSLU.
- A description of the corpora and an order form are available by
- anonymous ftp:
- + ftp://speech.cse.ogi.edu/pub/releases
- * Contact: Mike Noel -
- email: noel@cse.ogi.edu Phone: (503) 690-1309
-
- PhonDat - A Large Database of Spoken German
- * The PhonDat continuous speech corpora are now available on CD-ROM
- media (ISO 9660 format).
- + PhonDat I (Diphone Corpus) : 6 CDs (1140.- DM)
- + PhonDat II (Train Enquiries Corpus): 1 CD ( 190.- DM)
- * PhonDat I comprises approx. 20.000, PhonDat II approx. 1500 signal
- files in high quality 16-bit 16 KHz recording. The corpora come
- with documentation containing the orthographic transcription and a
- citation form of the utterances, as well as a detailed file format
- description. A narrow phonetic transcription is available for
- selected files from corpus I and II.
- * For information and orders contact
- Barbara Eisen
- Institut fuer Phonetik
- Schellingstr. 3 / II
- D 80799 Munich 40
- Tel: +49 / 89 / 2180 -2454 or -2758
- Fax: +49 / 89 / 280 03 62
-
- Oxford Acoustic Phonetic Database
- * Available on compact disc, from J. Pickering and B. Rosner. It
- contains data on vowel-consonant and consonant-vowel combinations
- in both stressed and unstressed locations. The language covered
- include French, German, Hungarian, Italian, Japanese, British
- English, Spanish and English. For further information write to
- Electronic Publishing, Oxford University
- Press, Walton Street, Oxford OX2 6DP, UK.
- The ISBN is 0-19-268086-2
- * Contact:
- Prof. B. Rosner
- Dept. of Experimental Psychology
- South Parks Rd, Oxford, OX1 3UD, UK
- email: burton.rosner@wolfson.ox.ac.uk
-
- _________________________________________________________________
-
- Q1.8: SPEECH FILE FORMATS, CONVERSION AND PLAYING.
-
- Section 2 of this FAQ has information on mu-law coding.
-
- A very good and very comprehensive list of audio file formats is
- prepared by Guido van Rossum. The list is posted regularly to comp.dsp
- and alt.binaries.sounds.misc, amongst others. It includes information
- on sampling rates, hardware, compression techniques, file format
- definitions, format conversion, standards, programming hints and lots
- more. It is also available by ftp from
- * ftp://ftp.cwi.nl/pub/audio/AudioFormats.part1,2
-
- _________________________________________________________________
-
- Q1.9: WHAT "SPEECH LABORATORY ENVIRONMENTS" ARE AVAILABLE?
-
- First, what is a Speech Laboratory Environment? A speech lab is a
- software package which provides the capability of recording, playing,
- analysing, processing, displaying and storing speech. Your computer
- will require audio input/output capability. The different packages
- vary greatly in features and capability - best to know what you want
- before you start looking around.
-
- Most general purpose audio processing packages will be able to process
- speech but do not necessarily have some specialised capabilities for
- speech (e.g. formant analysis).
-
- The following article provides a good survey.
- * Read, C., Buder, E., & Kent, R. "Speech Analysis Systems: An
- Evaluation" Journal of Speech and Hearing Research, pp 314-332,
- April 1992.
-
- Entropic Signal Processing System (ESPS) and Waves
- * Platform: Range of Unix platforms.
- * Description: ESPS is a comprehensive set of speech
- analysis/processing tools for the UNIX environment. The package
- includes UNIX commands, and a comprehensive C library (which can
- be accessed from other languages). Waves is a graphical front-end
- for speech processing. Speech waveforms, spectrograms, pitch
- traces etc can be displayed, edited and processed in X windows and
- Openwindows (versions 2 & 3). Waves also includes a signal
- labelling utility which provides multiple feature labelling and
- useful features for fast labelling of large speech databases.
- Entropic also distributes HTK (the Hidden Markov Model Toolkit).
- HTK is described in Section 6 of this FAQ.
- * Cost: On request.
- * Contact:
- Entropic Research Laboratory, Washington Research Laboratory
- 600 Pennsylvania Ave, S.E. Suite 202, Washington, D.C. 20003
- (202) 547-1420
- email - info@wrl.epi.com
-
- CSRE: Canadian Speech Research Environment
- * Platform: IBM/AT-compatibles
- * Description: CSRE is a microcomputer-based system designed to
- support speech research. CSRE provides a low-cost facility in
- support of speech research, using mass-produced and
- widely-available hardware. The project is non-profit, and relies
- on the cooperation of researchers at a number of institutions and
- fees generated when the software is distributed. Functions include
- speech capture, editing, and replay; several alternative spectral
- analysis procedures, with color and surface/3D displays; parameter
- extraction/ tracking and tools to automate measurement and support
- data logging; alternative pitch-extraction systems; parametric
- speech (KLATT80) and non-speech acoustic synthesis, with a variety
- of supporting productivity tools; and an experiment generator, to
- support behavioral testing using a variety of common testing
- protocols. A paper about the whole package can be found in:
- + Jamieson D.G. et al, "CSRE: A Speech Research Environment",
- Proc. of the Second Intl. Conf. on Spoken Language
- Processing, Edmonton: University of Alberta, pp. 1127-1130.
- * Hardware: Can use a range of data aqcuisition/DSP hardware
- * Cost: Distributed on a cost recovery basis.
- * Availability: For more information on availability contact
- Krystyna Marciniak
- email march@uwovax.uwo.ca
- Tel (519) 661-3901 Fax (519) 661-3805.
- For technical information
- email ramji@uwovax.uwo.ca
- * Note: Also included in Q5.4 on speech synthesis packages.
-
- OGI Speech Tools
- * Developers from the Center for Spoken Language Understanding
- (CSLU) at the Oregon Graduate Institute of Science and Technology
- (Portland Oregon)
- * Platform: Unix
- * Description: The OGI Speech tools include :
- + An X windows display tool (LYRE) for displaying data in a
- time synchronous fashion for a. the speech signal b.
- spectrograms c. phoneme labels, and other information.
- + A Neural Network (NOPT) training package.
- + An set of C library routines (LIBNSPEECH) for the
- manipulation of speech data, including: a. PLP Analysis, b.
- Rasta PLP Analysis, c. Linear Predictive Coding, d. Mel
- Cepstrum Coding, e. Fast Fourier Transform
- + A set of utilities for converting file formats such as ADC,
- NIST, mu-law, binary files, and ascii. Includes filtering.
- + A database utility (find_phone) to automate speech database
- related enquiries. It allows the user to specify a particular
- label or set of labels in a given context, display all
- occurrences of the label, and relabel the occurrences if
- desired.
- + A Vector-Quantizer based on the Linde Buzo and Gray (LBG)
- algorithm.
- + A set of PERL Scripts which have been used mainly to automate
- the use of the OGI Speech Tools.
- + MAN Pages for all routines and programs developed, as well as
- a User manual in both in postscript and tex format.
- * Misc: Software is written in ANSI C.
- * Availability: By anonymous ftp from
- + ftp://speech.cse.ogi.edu/pub/tools/
- * Contact: Try tools@cse.ogi.edu
-
- Matlab plus Signal Processing Toolbox
- * Platform: Wide range
- * Description: Matlab (MATrix LABoratory) is a technical computing
- environment for numerical computation and visualization based on a
- matrix oriented, interpreted programming language. The programming
- environment provides support for the development of customized
- operations, along with debugging facilities and a graphical user
- interface toolkit. Audio output is provided.
-
- A specialised Signal Processing Toolbox is available which
- provides many functions which are useful for speech analysis. It
- includes filter design, spectral estimation, statistical signal
- processing, waveform generation, and signal and spectrogram
- display.
-
- A specialised Auditory Toolbox is available which contains
- functions useful to people interested in auditory/cochlear models.
- A more detailed description is given in Q1.10.
- * Price: On request.
- * Contact: The Math Works Inc.
- 24 Prime Park Way, Natick, MA 01760-1500 USA
- Ph: 1-508-653 1415 Fax: 1-508-653 6284
- Email: info@mathworks.com
- * FTP: ftp://ftp.mathworks.com
- * WWW: http://www.mathworks.com/
-
- Signalyze 3.0 from InfoSignal
- * Platform: Macintosh
- * Description: Signalyze's basic conception revolves around up to
- 100 signals, displayed synchronously in HyperCard fashion on
- "cards". The program offers a complement of signal editing
- features, quite a few spectral analysis tools, manual scoring
- tools, pitch extraction routines, a good set of signal
- manipulation tools, and extensive input-output capacity.
-
- Handles multiple file formats: Signalyze, MacSpeech Lab,
- AudioMedia, SoundDesigner II, SoundEdit/MacRecorder, SoundWave,
- three sound resource formats, and ASCII-text. Sound I/O: Direct
- sound input from MacRecorder and similar devices, AudioMedia,
- AudioMedia II and AD IN, some MacADIOS boards and devices, Apple
- sound input (built-in microphone). Sound output via Macintosh
- internal sound, via SoundManager 3.0, some MacADIOS boards and
- devices as well as via the Digidesign 16-bit boards.
-
- It has a range of capabilities for creating, editing and
- manipulating label files with flexibility in labelling format.
- * Compatibility: MacPlus and higher (including II, IIx, IIcx,
- IIci, IIfx, IIvx, IIvi, Portable, all PowerBooks, Centris and
- Quadras). Takes advantage of large and multiple screens and 16/256
- color/grayscales. System 7.0 compatible. Runs in background with
- adjustable priority.
- * Misc: A demo available upon request. Manuals and tutorial
- included. It is available in English, French, and German. An
- UPDATER to version 2.48 is now available in:
- + - The UNIL Gopher server (see last page of InfoSignal News 8)
- + - The LAIP FTP server. Address: MACFL4082.unil.ch, machine
- no. 130.223.104.31
- Also available are a demo program, and current questions and answers.
- * Cost: Individual licence US$350, site license US$500, plus
- shipping. Upgrades from version 2.0 are available.
- * Contact:
- North America - Network Technology Corporation
- 91 Baldwin St., Charlestown MA 02129
- Fax: 617-241-5064 Phone: 617-241-9205
- Elsewhere contact
- InfoSignal Inc.
- C.P. 73, 1015 LAUSANNE, Switzerland,
- FAX: +41 21 691-1372,
- Email: 76357.1213@COMPUSERVE.COM.
-
- Kay Elemetrics CSL (Computer Speech Lab) 4300
- * Platform: Minimum IBM PC-AT compatible with extended memory (min
- 2MB) with at least VGA graphics. Optimal would be 386 or 486
- machine with more RAM for handling larger amounts of data.
- * Description: Speech analysis package, with optional separate LPC
- program for analysis/synthesis. Uses its own file format for data,
- but has some ability to export data as ascii. The main
- editing/analysis prog (but not the LPC part) has its own macro
- language, making it easy to perform repetitive tasks. Probably not
- much use without the extra LPC program, which also allows
- manipulation of pitch, formant and bandwidth parameters.
-
- Hardware includes an internal DSP board for the PC (requires ISA
- slot), and an external module containing signal processing chips
- which does A/D and D/A conversion.
- * Misc: A programmers kit is available for programming signal
- processing chips (experts only). A speaker and microphone are
- supplied. Manuals are included.
- * Cost: Recently approx 6000 pounds sterling.
- * Contact:
- UK distributors are Wessex Electronics,
- 114-116 North Street, Downend, Bristol, B16 5SE
- Tel: 0272 571404.
- In the USA contact:
- Kay Elemetrics Corp,
- 12 Maple Avenue, PO Box 2025, Pine Brook, NJ 07058-9798
- Tel:(201) 227-7760
-
- MacSpeech Lab II (MSL II)
- * Platform: Macintosh
- * Description: A sound analysis and acquisition for Macs. MSL II
- delivers the most common functions for speech analysis (FFTs,
- LPCs, f0 extraction, etc.) & produces grayscale spectrographic
- displays. Can be used for various speech technology and phonetic
- training tasks. The software an trade off accuracy and speech.
- * Hardware: Requires MacADIOS ("Macintosh Analog/Digital
- Input/Output System") hardware for speech I/O at 12/16 bits.
- * Misc: Software no longer updated by GW Instruments; MSL
- soft/hardware will not perform input/output on Quadras, for
- example, though analysis seems fine. Known to operate properly on
- systems as high as IIcx & II fx.
- * Cost: $4990 (in May '92 price list; no MSL soft/hardware package
- listed in January '93).
- * Contact:
- GW Instruments
- 35 Medford Street, Somerville, MA 02143
- Phone: (617) 625-4096 Fax: (617) 625-1322
-
- N!Power
- * Platform: SUN, DEC and HP workstations.
- * Description: An object-oriented software package with a MOTIF
- GUI interface and a range of functionality for data
- analysis/editing, signal analysis, speech processing, real-time
- A/D and D/A, and 2D/3D interactive graphics. N!Power replaces ILS.
-
- N!Power can provide a Block Diagram user interface, menus,
- pop-ups, and a high-level IEEE standard symbolic scripting
- language. You can customize the blocks, menus and pop-ups with
- mouse point-and-click operations.
- * Contact:
- Signal Technology, Inc.
- 104 W. Anapamu, Suite J, Santa Barbara, CA 93101-3126
- Phone: 805-899-8300 FAX: 805-899-4344
- email: larry@signal.com
-
- Ptolemy
- * Platform: Sun SPARC, DecStation (MIPS), HP (hppa).
- * Description: Ptolemy provides a highly flexible foundation for
- the specification, simulation, and rapid prototyping of systems.
- It is an object oriented framework within which diverse models of
- computation can co-exist and interact. Ptolemy can be used to
- model entire systems.
-
- Ptolemy has been used for a broad range of applications including
- signal processing, telecomunications, parallel processing,
- wireless communications, network design, radio astronomy, real
- time systems, and hardware/software co-design. Ptolemy has also
- been used as a lab for signal processing and communications
- courses. Ptolemy has been developed at UC Berkeley over the past 3
- years. Further information, including papers and the complete
- release notes, is available from the FTP site.
- * Cost: Free
- * Availability: The source code, binaries, and documentation are
- available by anonymous ftp from
- + ftp://ptolemy.berkeley.edu/pub/README
-
- Khoros
- * Description: Public domain image processing package with a basic
- DSP library. Not particularly applicable to speech, but not bad
- for the price.
- * Cost: Free
- * Availability: By anonymous ftp from ftp://pprg.eece.unm.edu
-
- SpeechViewer II
- * Description: Speech Therapy Tool. See the detailed description
- in the handicap section - Q1.6.
-
- _________________________________________________________________
-
- Q1.10: MISCELANEOUS SOFTWARE AND OTHER RESOURCES.
-
- CMU dictionary
- * Description: Phonemic transcriptions of 100,000 words with
- American English pronunciation.
- * Availability: By anonymous ftp from the directory
- + ftp://ftp.cs.cmu.edu/project/fgdata/dict
- with the files README, cmudict.0.2.Z, cmulex.0.1.Z, phoneset.0.1
-
- Dictionary
- * Description: A comprehensive word list which should contain most
- common American words, abbreviations, hyphenations, and even
- incorrect spellings. The word lists were compiled from a number of
- sources: commercial news services, UseNet news postings, existing
- dictionaries, name lists, company lists, UNIX man pages, project
- Gutenberg's E-texts, project Wordnet, received mailings, etc. The
- current size is 460,000 words.
- * Availability: By anonymous ftp from
- + ftp://wocket.vantage.gte.com:/pub/standard_dictionary
-
- Note 1: There seems to be some sort of network problem reaching
- the server.
- Note 2: There is a README file which explains the file formats.
-
- BEEP dictionary
- * Description: Phonemic transcriptions of 100,000 English words.
- (British English pronunciations)
- * Availability: By anonymous ftp from the file
- + svr-ftp.eng.cam.ac.uk/comp.speech/data/beep-0.3.tar.Z
-
- CUVOLAD dictionary
- * Description: Computer Usable Version of the Oxford Advanced
- Learner's Dictionary Has British English pronunciations and parts
- of speech
- * Availability: By anonymous ftp from the directory
- + ftp://black.ox.ac.uk/ota/dicts/710
-
- MRC database
- * Description: The Medical Research Council Psycholinguistic
- Database Has British English pronunciations, parts of speech, word
- frequency and lots of other information.
- * Availability: By anonymous ftp from the directory
- + ftp://black.ox.ac.uk/ota/dicts/1054
-
- Network Audio System Release 1.1
- * Platforms: Various (includes SunOS, Solaris, SGI)
- * Description: A device-independent mechanism for transferring,
- playing and recording audio signals over a network. Has a range of
- features suited to networks.
- * Cost: Free
- * Availability: By anonymous ftp from
- + ftp://ftp.x.org:/contrib/audio/nas/netaudio-1.2.tar.gz
- Also available in the same directory are document files and some
- sample sounds.
-
- AF version AF3R1
- * Platforms: DEC workstations (Alpha and MIPS), SparcStation, SGI
- * Description: The AF System is a device-independent
- network-transparent system including client applications and audio
- servers. With AF, multiple audio applications can run
- simultaneously, sharing access to the actual audio hardware.
-
- The AF3R1 distribution of AF includes server support for Digital
- RISC systems running Ultrix, Digital Alpha AXP systems running
- OSF/1, SGI Indigo running IRIX 4.0.5, Sun Microsystems
- SPARCstations running SunOS 4.1.3, and Sun Microsystems
- SPARCstations running Solaris 2.3. The servers support audio
- hardware ranging from the built-in CODEC audio on SPARCstations
- and Personal DECstations to 48 KHz stereo audio using the DECaudio
- TURBOchannel module or the SPARCstation DBRI interface
- * Availability: The source kit is distributed by anonymous ftp
- from
- + ftp://crl.dec.com/pub/DEC/AF
- * Contact: af-request@crl.dec.com
- + http://www.research.digital.com/CRL/projects/AF/home.html
-
- NEVOT (1.4v) from AT&T; BL
- * Platforms: Sun Sparc Station (SunOS 4.1.x) and Silicon Graphics
- * Description: Audio-conferencing tool which supports both
- point-to-point and broadcasting of audio using multicast IP. Audio
- encoding:
- + PCM 64kb/s 8-bits u-law encoded 8KHz PCM (G.711)
- + ADPCM 32 kb/s [Sun only] (G.721)
- + DVI ADPCM 32 kb/s
- + ADPCM 24 kb/s [Sun only] (G.723)
- + CELP 4.8 kb/s
- + LPC 2.4 kb/s
- Source is available.
- * Availability: by anonymous ftp from
- + ftp://gaia.cs.umass.edu/pub/hgschulz/nevot
- * Contact: Henning Schulzrinne (hgs@researh.att.com)
-
- Human Audio Perception Document
- * Description: Document prepared by Argiris Kranidiotis on the
- human audio perception system. It lists a number of references,
- gives plenty of numbers and some equations.
- * Availability: by anonymous ftp from the comp.speech archive site
- +
- ftp://svr-ftp.eng.cam.ac.uk/comp.speech/info/HumanAudioPercept
- ion
- * Contact:
- Argiris A. Kranidiotis
- University Of Athens, Informatics Department
- email: akra@zeus.di.uoa.ariadne-t.gr
-
- Homophone List
- * A list of homophones in General American English is available by
- anonymous FTP from the comp.speech archive site:
- +
- ftp://svr-ftp.eng.cam.ac.uk/comp.speech/data/homophones-1.01.t
- xt
-
- Auditory Toolbox for Matlab
- * Description: This toolbox provides extensions to Matlab which
- are useful to people interested in auditory/cochlear modeling.
- [Matlab is described is the previous section.] This toolbox has
- been tested on both Macintosh and Unix computers. It includes the
- following major models:
- + Lyon's Passive Long Wave Cochlear Model (our conventional
- model)
- + Patterson-Holdsworth ERB Filter bank with Meddis Hair cell
- + Seneff's Auditory Model (Stages I and II)
- + MFCC (Mel-scale frequency cepstral coefficients from the ASR
- world)
- + Spectrogram
- + Correlogram generation and pitch modeling
- + Simple vowel synthesis
- * Availability: By anonymous FTP from the following site:
- + ftp://ftp.apple.com/pub/malcolm
- The following files are available:
- + 419487 AuditoryToolbox.mif.Z
- + 1372976 AuditoryToolbox.psc.Z
- + 573215 AuditoryToolbox.sea.hqx
- + 92160 AuditoryToolbox.tar
- + 36405 AuditoryToolbox.tar.Z
- The ".mif.Z" file is a Unix compressed version of the FrameMaker
- documentation. The ".psc.Z" file is a Unix compressed version of
- the Postscript documentation. The ".tar" and ".tar.Z" files are
- Unix TAR archives containing all of the m-functions and C-MEX
- source code. Finally, the ".sea.hqx" file is a Macintosh
- self-extracting archive that has been encoded using BinHex. We do
- provide precompiled version of the three MEX function for the
- Macintosh.
- * Misc: Our lawyers ask you to remind you that there is no
- warranty. We've done some testing but we undoubtably missed
- things.
- * Contact:
- Malcolm Slaney: Interval Resarch.
- Email: malcolm@interval.com
-
- Auditory Modeller 1
- * Description: John Holdsworth's implementation of a gammatone
- filter bank and Roy Patterson's spiral model, in C (with X-window
- display).
- * Availability: By anonymous ftp from
- + ftp://ftp.mrc-apu.cam.ac.uk/pub/aim
-
- Auditory Modeller 2
- * Description: Lowel O'Mard's implementation of peripheral
- filtering, Ray Meddis's hair cell model and other stuff in C (as a
- library of routines).
- * Availability: By anonymous ftp from
- + ftp://suna.lut.ac.uk/public/hulpo/lutear
-
- _________________________________________________________________
-
-
-
-
- Andrew Hunt
- ---
- Speech Technology Research Group Ph: 61-2-351 4509
- Dept. of Electrical Engineering Fax: 61-2-351 3847
- University of Sydney, NSW, 2006, Australia email: andrewh@speech.su.oz.au
-
-