home *** CD-ROM | disk | FTP | other *** search
- Path: senator-bedfellow.mit.edu!faqserv
- From: andrew.hunt@east.sun.com (Andrew Hunt)
- Newsgroups: comp.speech,comp.answers,news.answers
- Subject: comp.speech Frequently Asked Questions - part 2/3
- Supersedes: <comp-speech-faq/part2_897652698@rtfm.mit.edu>
- Followup-To: comp.speech
- Date: 12 Jul 1998 12:00:23 GMT
- Organization: Speech Applications Group, Sun Microsystems Laboratories
- Lines: 1469
- Approved: news-answers-request@MIT.Edu
- Expires: 23 Aug 1998 12:00:04 GMT
- Message-ID: <comp-speech-faq/part2_900244804@rtfm.mit.edu>
- References: <comp-speech-faq/part1_900244804@rtfm.mit.edu>
- Reply-To: andrew.hunt@east.sun.com (Andrew Hunt)
- NNTP-Posting-Host: penguin-lust.mit.edu
- Summary: Information on Speech Technology
- X-Last-Updated: 1998/07/08
- Originator: faqserv@penguin-lust.MIT.EDU
- Xref: senator-bedfellow.mit.edu comp.speech:18456 comp.answers:32122 news.answers:134643
-
- Archive-name: comp-speech-faq/part2
- Last-modified: 1998/07/06
- URL: http://www.speech.su.oz.au/comp.speech/
-
- COMP.SPEECH FAQ POSTING - PART 2/3
-
-
- [Note: this document has been automatically extracted from a WWW site:
- http://www.speech.su.oz.au/comp.speech/
- This may introduce some formatting errors.]
-
-
- Signal Processing for Speech
-
- comp.speech FAQ Section 2
-
- * SpeechLinks: Signal Processing for Speech
- * Q2.1: What sampling do I need for speech?
- * Q2.2: Finding the pitch of a speech signal
- * Q2.3: How do I find the start and end points of a speech
- signal?
- * Q2.4: Where can I find FFT software?
- * Q2.5: Signal processing in speech technology
- * Q2.6: Speech sampling and signal processing hardware
- * Q2.7: How do I convert to/from mu-law format?
- * Q2.8: Signal Processing Software
-
-
- ___________________________________________________________________________
-
- Q2.1: What sampling do I need for speech?
-
- For recorded speech to be understood by humans you need an 8kHz
- sampling rate or more and at least 8 bit sampling. This produces poor
- quality speech - but in can be understood.
-
- Improvements can be achieved by increasing the number of bits in
- sampling to 12bits or 16bits, or by using a non-linear encoding
- technique such as mu-law or A-law (see Q2.7). This improves the
- "signal-to-noise" ratio.
-
- Increasing the sampling rate above 8kHz, say to 10kHz, 16kHz or 20Khz,
- improves the frequency response: the higher the sampling frequency the
- better the high frequency content will be. A 16kHz sampling rate is a
- reasonable target for high quality speech recording and playback.
-
- When doing speech recognition you need to remember that the your
- computer is not as good as your ear so it will have trouble with poor
- quality sounds. The choice of an appropriate sampling setup depends
- very much on the speech recognition task and the amount of computer
- power available.
-
-
- ___________________________________________________________________________
-
- Q2.2: Finding the pitch of a speech signal
-
- This topic comes up regularly in the comp.dsp newsgroup. Question 2.5
- of the FAQ posting for comp.dsp gives a comprehensive list of
- references on the definition, perception and processing of pitch. The
- comp.dsp FAQ posting is posted regularly to the comp.dsp newsgroup,
- and is also available by ftp and on the WWW:
-
- * http://www.bdti.com/faq/dsp_faq.htm
- * ftp://rtfm.mit.edu/pub/usenet/comp.dsp/
-
- The following provide pitch tracking software:
-
- * Most of the speech processing environments listed in Q1.9
- including CSRE, ESPS, Kay Elemetrics Computer Speech Lab, OGI
- Speech Tools, Speech Filing System, Signalyze, Soundscope.
-
-
- ___________________________________________________________________________
-
- Q2.3: Finding start and end points of a speech signal
-
- End-point detection algorithms identify sections in an incoming audio
- signal that contain speech. Accurate end-pointing is a non-trivial
- task, however, reasonable behaviour can be obtained for inputs which
- contain only speech surrounded by silence (no other noises). Typical
- algorithms look at the energy or amplitude of the incoming signal and
- at the rate of "zero-crossings". A zero-crossing is where the audio
- signal changes from positive to negative or visa versa. When the
- energy and zero-crossings are at certain levels, it is reasonable to
- guess that there is speech. More detailed descriptions are provided in
- the papers cited below and in the documentation for the following
- software.
-
- End-point detection software is available from:
-
- * ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/tools/ep.1.0.tar.gz
- *
- ftp://ftp.isip.msstate.edu/pub/software/signal_detector/sigd_v2.2.t
- ar.gz
-
- Plenty of research papers have been presented on end-pointing. Try the
- following:
-
- * Rabiner LR, Sambur MR, "An Algorithm for Determining the Endpoints
- of Isolated Utterances", Bell System Technical Journal, Vol 54,
- No. 2, pp 297-315, 1975.
- * Drago, P.G. et al. "Digital Dynamic Speech Detectors." IEEE Trans
- on Communications, Vol 26, No 1, Jan 78, pp. 140-145.
- * Newman, W.C. "Detecting Speech with an Adapative Neural Network."
- Electronic Design. 22 March 1990.
- * Taboada. J et al "Explicit Estimation of Speech Boundaries" IEE
- Proc. Sci. Meas. Technol., Vol 141, No.3, May 1994, pp 153-159.
-
-
- ___________________________________________________________________________
-
- Q2.4: FFT Software
-
- * Comprehensive list of FFT software
- Links to over 65 different pieces of one-dimensional FFT code.
- http://tjev.tel.etf.hr/josip/DSP/fft.html
-
- * FFT Software including optimised fft routines and mixed-radix
- algorithms
- ftp://usc.edu/pub/C-numanal/fft-stuff.tar.gz
- OR,
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/fft-stuff.
- tar.gz
-
- * mixfft03.zip: C-source for a very fast arbitrary N FFT routine
- The C-source is ShareWare: read the text file included in the
- package before using the FFT routine commercially.
- Jens J. Nielsen: jnielsen@internet.dk
- Available from
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/analysis/mixfft03.z
- ip
- OR ftp://ftp.coast.net/simtel/msdos/c/mixfft03.zip
-
- * FFTW
- FFTW is a C subroutine library for computing the FFT in one or
- more dimensions. It is not limited to sizes that are powers of
- two, and includes real-complex and parallel transforms.
- Also on the FFTW web site are benchmarks comparing the
- performance and accuracy of many public-domain FFT
- implementations on a variety of platforms, as well as links to
- other sources of FFT code and information.
- Available from http://theory.lcs.mit.edu/~fftw
- Developed by Matteo Frigo and Steven G. Johnson:
- fftw@theory.lcs.mit.edu
-
-
- ___________________________________________________________________________
-
- Q2.5: Signal processing in speech technology
-
- This question is far to big to be answered in a FAQ posting. Here are
- some WWW resources and books which cover the area well.
-
- Tony Robinson's Course Notes
-
- Dr. Tony Robinson of the Engineering Dept of Cambridge University has
- put his Speech Analysis course notes on the web. The base page is
- http://svr-www.eng.cam.ac.uk/~ajr/SA95/. There is information on the
- following:
-
- * Sampling theory
- * Filter bank analysis
- * Short-term fourier analysis
- * Linear prediction analysis
- * Formant analysis and voicing analysis
- * Speech coding
- * and more....
-
- Joseph Picone's Course Notes
-
- Joseph Picone of the Institute for Signal and Information Processing
- (ISIP) at Mississippi State University has put two sets of course
- notes on the web:
-
- EE 4773/6773: Digital Signal Processing
- The course covers sampling, frequency analysis, z-transforms,
- filter design and more. The WWW site provides the syllabus,
- assignments, some source code data, exams, homework and
- solutions, lecture notes and more.
-
- EE 8993: Fundamentals of Speech Recognition
- The course covers background probability and
- phonetics/acoustics, speech signal analysis, dynamic
- programming, dynamic time warping, hidden Markov modelling,
- language modelling, neural networks, etc. The WWW sites
- provides the syllabus and lecture notes.
-
- Signal Processing Home page
-
- The Signal Processing Home page has information on a range of DSP
- issues. It includes references to a range of software and much more.
- http://tjev.tel.etf.hr/josip/DSP/sigproc.html
-
- Books and other References
-
- There are many good books which discuss signal processing for speech:
-
- * Digital processing of speech signals; L. R. Rabiner, R. W.
- Schafer. Englewood Cliffs; London: Prentice-Hall, 1978
- * Voice and Speech Processing; T. W. Parsons. New York; McGraw Hill
- 1986
- * Computer Speech Processing; ed Frank Fallside, William A. Woods
- Englewood Cliffs: Prentice-Hall, c1985
- * Digital speech processing : speech coding, synthesis, and
- recognition edited by A. Nejat Ince; Kluwer Academic Publishers,
- Boston, c1992
- * Speech science and technology; edited by Shuzo Saito pub. Ohmsha,
- Tokyo, c1992
- * Speech analysis; edited by Ronald W. Schafer, John D. Markel, New
- York, IEEE Press, c1979
- * Applied Speech Technology Edited by: Ann Syrdal (AT&T Bell Labs,
- Holmdel, New Jersey), Raymond Bennett (Ameritech, Hoffman Estates,
- Illinois) and Steven Greenspan (AT&T Bell Labs, Murray Hill, New
- Jersey). Publisher: CRC Press.
- * Speech Communication: Human and Machine Douglas O'Shaughnessy,
- Addison Wesley series in Electrical Engineering: Digital Signal
- Processing, 1987.
- * Discrete-time processing of speech signals; John R Deller, John G
- Proakis, John H L Hansen; Macmillan 1993.
- * Signal processing of speech; F J Owens; Macmillan 1993.
-
-
- ___________________________________________________________________________
-
- Q2.6: Speech sampling and signal processing hardware
-
- In addition to the following information, have a look at the Audio
- File format document prepared by Guido van Rossum (see details in
- Section 1.8).
-
- Information is included on hardware for the following systems:
-
- * Macintosh Audio Hardware
- * PC Audio Hardware
- * Unix Audio Hardware
-
- Can anyone provide information for SGI, NeXT, other UNIX hardware and
- any other PC soundcards?
-
-
-
- Macintosh Audio Hardware - an overview
-
- * Description: ALL Macintosh computers come with the ability to play
- back sounds at any sample rate (sample rate conversion is done in
- software.) Older machines have 8 bit stereo output (hardware runs
- at 22254 samples/second). The newer machines have 16 bit stereo
- hardare running at 44100 samples/second.
- Most of the recent Macintosh computers come with sound input
- hardware. There are probably exceptions to this, but the older and
- some of the current low-end machines have 8 bit (linear) mono
- hardware running at 22254.54 samples/second. All of the PowerPC,
- AV, and the 500 series notebook computers come with 16 bit 44kHz
- stereo sampling hardware. They can also record at 22050
- samples/second. The sound manager implements an AGC (Automatic
- Gain Control) function for the 8 bit hardware. The drivers have a
- switch to turn off the AGC.
- There are a number of DSP vendors that support high quality audio.
- Generally this means quieter analog sections, and more IO formats
- (AES/IBU, for example). Try DigiDesign and Spectral Innovations.
- The software drivers for sound are described in "Inside Macintosh:
- Sound". If you want to see some sample code check out the sources
- for the Matlab "Sound and Image Toolbox". They can be found at
-
- ftp://ftp.apple.com/pub/malcolm/SoundAndImageToolbox.cpt.
- hqx
-
- Routines that play and record sounds using the toolbox are
- included (and interfaced to Matlab).
-
-
-
- PC Audio Hardware
-
- Note: new soundcards are becoming available all the time - the
- information below is definitely not up to date. Check out the
- following newsgroups for up-to-date information.
-
- * comp.sys.ibm.pc.soundcard
- * comp.sys.ibm.pc.soundcard.GUS
- * comp.sys.ibm.pc.soundcard.advocacy
- * comp.sys.ibm.pc.soundcard.games
- * comp.sys.ibm.pc.soundcard.misc
- * comp.sys.ibm.pc.soundcard.music
- * comp.sys.ibm.pc.soundcard.tech
-
- The Soundcard WWW Site is an excellent source of information:
-
- * http://www.wi.leidenuniv.nl/audio/
-
- An good source of programs and information for soundcards is SimTel:
-
- * http://www.acs.oakland.edu/oak/SimTel/win3/sound.html
-
- Additional information on PC soundcards is provided by the FAQ
- postings for the comp.sys.ibm.pc.soundcard.misc newsgroup. These are
- available by anonymous ftp from:
- ftp://rtfm.mit.edu/pub/usenet/comp.sys.ibm.pc.soundcard.misc/
-
- * Aria Soundcard FAQ
- * Aria Soundcard Support List
- * MIDI files software archives on the Internet
- * Turtle Beach sound cards FAQ
-
-
-
- Unix Audio Hardware
-
- Could someone please provide information on the audio capabilities of
- other Unix platforms?
-
- Sun standard audio port: SPARC I & II
-
- * Input and Output: 1 channel, 8 bit mu-law encoded, 8kHz sample
- rate. This provides telephone quality sampling.
-
- Sun DBRI audio port (SPARC 10 & 20)
-
- * Input and Output: Stereo (2 channels). 16-bit linear sampling.
- Multiple sample rates (48000, 44100, 37800, 32000, 22050, 18900,
- 16000, 11025, 9600, 8000 Hz)
-
- Silicon Graphics Audio
-
- The Silicon Graphics audio Frequently Asked Questions (FAQ) is the
- best place to get information on SGI audio capabilities and
- programming. It provides information on connecting the audio output,
- using the DSP capabilities, controlling the audio output, programming,
- useful software and more. It is available from:
-
- * WWW: http://www-viz.tamu.edu/~sgi-faq/faq/html/audio/
- * News: comp.sys.sgi.misc
- * Ftp: ftp://viz.tamu.edu/pub/sgi/faq/
-
- Ariel Signal Processors
-
- * Platform: Various
- * Description: A range of signal I/O, A/D, D/A and DSP products are
- available. There are too many to list.
- * Contact: Ariel Corp.
- 433 River Road, Highland Park, NJ 08904.
- Ph: 908-249-2900 Fax: 908-249-2123 DSP BBS: 908-249-2124
-
-
- ___________________________________________________________________________
-
- Q2.7: How do I convert to/from mu-law format?
-
- Mu-law coding is a form of compression for audio signals including
- speech. It is widely used in the telecommunications field because it
- improves the signal-to-noise ratio without increasing the amount of
- data. Typically, mu-law compressed speech is carried in 8-bit samples.
- It is a companding technqiue. That means that carries more information
- about the smaller signals than about larger signals.
-
- On SUN Sparc systems have a look in the directory /usr/demo/SOUND.
- Included are table lookup macros for ulaw conversions. [Note however
- that not all systems will have /usr/demo/SOUND installed as it is
- optional - see your system admin if it is missing.]
-
- OR, here is some sample conversion code in C.
-
- /**
- ** Signal conversion routines for use with Sun4/60 audio chip
- **/
-
- #include stdio.h
-
- unsigned char linear2ulaw(/* int */);
- int ulaw2linear(/* unsigned char */);
-
- /*
- ** This routine converts from linear to ulaw
- **
- ** Craig Reese: IDA/Supercomputing Research Center
- ** Joe Campbell: Department of Defense
- ** 29 September 1989
- **
- ** References:
- ** 1) CCITT Recommendation G.711 (very difficult to follow)
- ** 2) "A New Digital Technique for Implementation of Any
- ** Continuous PCM Companding Law," Villeret, Michel,
- ** et al. 1973 IEEE Int. Conf. on Communications, Vol 1,
- ** 1973, pg. 11.12-11.17
- ** 3) MIL-STD-188-113,"Interoperability and Performance Standards
- ** for Analog-to_Digital Conversion Techniques,"
- ** 17 February 1987
- **
- ** Input: Signed 16 bit linear sample
- ** Output: 8 bit ulaw sample
- */
-
- #define ZEROTRAP /* turn on the trap as per the MIL-STD */
- #define BIAS 0x84 /* define the add-in bias for 16 bit samples */
- #define CLIP 32635
-
- unsigned char
- linear2ulaw(sample)
- int sample; {
- static int exp_lut[256] = {0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
- 4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
- 5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
- 5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
- 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
- 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
- 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
- 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
- 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};
- int sign, exponent, mantissa;
- unsigned char ulawbyte;
-
- /* Get the sample into sign-magnitude. */
- sign = (sample >> 8) & 0x80; /* set aside the sign */
- if (sign != 0) sample = -sample; /* get magnitude */
- if (sample > CLIP) sample = CLIP; /* clip the magnitude */
-
- /* Convert from 16 bit linear to ulaw. */
- sample = sample + BIAS;
- exponent = exp_lut[(sample >> 7) & 0xFF];
- mantissa = (sample >> (exponent + 3)) & 0x0F;
- ulawbyte = ~(sign | (exponent << 4) | mantissa);
- #ifdef ZEROTRAP
- if (ulawbyte == 0) ulawbyte = 0x02; /* optional CCITT trap */
- #endif
-
- return(ulawbyte);
- }
-
- /*
- ** This routine converts from ulaw to 16 bit linear.
- **
- ** Craig Reese: IDA/Supercomputing Research Center
- ** 29 September 1989
- **
- ** References:
- ** 1) CCITT Recommendation G.711 (very difficult to follow)
- ** 2) MIL-STD-188-113,"Interoperability and Performance Standards
- ** for Analog-to_Digital Conversion Techniques,"
- ** 17 February 1987
- **
- ** Input: 8 bit ulaw sample
- ** Output: signed 16 bit linear sample
- */
-
- int
- ulaw2linear(ulawbyte)
- unsigned char ulawbyte;
- {
- static int exp_lut[8] = {0,132,396,924,1980,4092,8316,16764};
- int sign, exponent, mantissa, sample;
-
- ulawbyte = ~ulawbyte;
- sign = (ulawbyte & 0x80);
- exponent = (ulawbyte >> 4) & 0x07;
- mantissa = ulawbyte & 0x0F;
- sample = exp_lut[exponent] + (mantissa << (exponent + 3));
- if (sign != 0) sample = -sample;
-
- return(sample);
- }
-
-
- ___________________________________________________________________________
-
- Q2.8: Signal Processing Software
-
- [Note: Question 1.9 lists speech laboratory environments and audio
- editors, many of which provide basic and advanced signal processing
- capabilities.]
-
- Signal Processing Products
-
- * SigLib from Numerix Ltd.
-
- On the Web
-
- The following sites provide lists of useful DSP software. Not all the
- software is directly applicable to speech processing.
-
- comp.dsp FAQ
- http://www.bdti.com/faq/dsp_faq.htm
-
- DSP Internet Resources
- http://www.eg3.com/
- http://www.eg3.com/dsp.htm
-
- Poynton's Digital Signal Processing Resource List
- http://www.inforamp.net/~poynton/Poynton-dsp.html
-
- WWW Pages Relating to Sound Computation
- http://datura.cerl.uiuc.edu/netstuff/sigsoundLinks.html
-
- Yahoo - Signal and Image Processing
- http://www.yahoo.com/Science/Engineering/Electrical_Engineering
- /Signal_and_Image_Processing/
-
- Sound Related Resources
- http://pscinfo.psc.edu/~geigel/menus/sound.html
-
- SPLIB: Signal Processing url LIBrary
- http://jazz.rice.edu/splib/
-
- Wavelet's Home Page
- http://www.mat.sbg.ac.at/~uhl/wav.html
-
-
-
- SigLib from Numerix Ltd.
-
- * Platform: Windows, Unix and all major DSPs
- * Description: SigLib is an ANSI C Source DSP Library and includes
- functions for the following areas : spectrum analysis, windowing,
- filtering (fixed and adaptive coefficient), convolution,
- correlation, covariance, signal generation, statistical analysis,
- regression analysis, communications and modulation, digital
- effects, vectors processing, control, graphics and file I/O.
- Detailed product information and a description of the application
- of SigLib to speech processing is provided on the Numerix Ltd. WWW
- site.
- * Availability: A free demonstration of SigLib V2.0 is available
- from the Numerix Ltd. WWW site. Educational discount is available
- for SigLib.
- * Contact: Numerix Ltd.,
- 157 Sileby Road, Barrow-on-Soar, Leics, LE12 8LW, UK.
- Phone/Fax : +44 (0)1509 413195
- Email: numerix@numerix.co.uk
- WWW: http://www.numerix.co.uk/
-
-
- ___________________________________________________________________________
-
- Speech Coding and Compression
-
- comp.speech FAQ Section 3
-
- * SpeechLinks: Speech Coding
- * Q3.1: Speech compression techniques
- * Q3.2: Information on speech coding and compression
- * Q3.3: Speech Compression / Coding Software
-
-
- ___________________________________________________________________________
-
- Q3.1: Speech compression techniques
-
- Provided by Tony Robinson:
-
- The aim of speech compression is to produce a compact representation
- of speech sounds such that when reconstructed it is perceived to be
- close to the original. The two main measures of closeness are
- intelligibility and naturalness.
-
- The standard reference point is toll quality speech, this is the same
- as what would be expected over a telephone line, for example, speech
- coded at 8 kHz using 8 bit ulaw coding and a maximum frequency of
- about 3.3 kHz. This is a bit rate of 64 kbps, and as such represents a
- compressed form over (say) 16 bit, 16 kHz speech which is the standard
- in speech recognition work.
-
- ulaw coding does not exploit the (normally large) sample to sample
- correlations found in speech. ADPCM is the next family of speech
- coding techniques, and does exploit this redundancy by using a simple
- linear filter to predict the next sample of speech. The resulting
- prediction error is typically quantised to 4 bits thus giving a bit
- rate of 32 kbps (see, for example, the software in Q3.3: 32 kbps
- ADPCM, G.711/721/723 Compression, shorten). The advantages of ADPCM
- are that is simple to implement and has very low delay.
-
- To obtain more compression specific properties of the speech signal
- must be modelling. The main assumption is known as the source filter
- model of speech production. This assumes that a source (voicing or
- fricative excitation) is passed through a filter (the vocal tract
- response) to produce the speech. The simplest implementation of this
- is known as a LPC synthesiser (e.g. LPC10e). At every frame the speech
- is analysed to compute the filter coefficients, the energy of the
- excitation, a voicing decision, and a pitch value if voiced. At the
- decoder a regular set of pulses for voiced speech or white noise for
- unvoiced speech is passed through the linear filter and multiplied by
- the gain to produce the speech. This is a very efficient system and
- typically produces speech coded at 1200-2400bps. With clever acoustic
- vector prediction this can be reduced to 300-600bps. The disadvantages
- are a loss of naturalness over most of the speech and occasionally a
- loss of intelligibility.
-
- The CELP family of coders compensates for the lack of quality of the
- simple LPC model by using more information in the excitation. Each of
- a set of codebook of excitation vectors is tried and the index of the
- one that best matches the original speech is transmitted. This results
- in an increase in the bit rate to typically 4800-9600bps. Most speech
- coding research is currently directed towards CELP coders. (See, for
- example, CELP 3.2a, a TMS implementation, a G.728 LD-CELP vocoder, and
- the L&H implementation.
-
-
- ___________________________________________________________________________
-
- Q3.2: Information on speech coding and compression
-
- Reference Books
-
- The following books cover speech coding/compression.
-
- * Douglas O'Shaughnessy, Speech Communication: Human and Machine,
- Addison Wesley series in Electrical Engineering: Digital Signal
- Processing, 1987.
- * Bishnu Atal in ed. Fallside, F. and W. Woods, ed. Computer Speech
- Processing. London: Prentice/Hall International, 1985. N. S.
- Jayant and P. Noll, Digital Coding of Waveforms, Prentice Hall,
- ISBN 0-13-211913-7 01, 1984.
- * W.B. Kleijn and K.K. Paliwal (Eds.), Speech Coding and Synthesis,
- Elsevier, Amsterdam, 1995.
- Contents, preface etc on the WWW:
- http://www.elsevier.nl/section/engtech/scs/menu.htm
- * Thomas P. Barnwell, Kambiz Nayebi and Craig H Richardson, Speech
- Coding: A Computer Laboratory Textbook, John Wiley and Sons Inc,
- 1996.
- * Schuyler R Quackenbush, Tom P Barnwell III, Mark A Clements,
- Objective Measures of Speech Quality, Prentice-Hall, 1988.
-
- And the are good tutorial articles.
-
- * Makhoul, J. "Linear Prediction: A Tutorial Review." Proc. of the
- IEEE 63 (1975): 561 - 580.
-
- On the WWW
-
- comp.compression FAQ
- Includes a few questions and answers on the compression of
- speech.
- ftp://rtfm.mit.edu/pub/usenet/comp.compression/
-
- Tony Robinson's Speech Analysis Course
- A complete course on speech analysis, including some stuff on
- speech coding.
- http://svr-www.eng.cam.ac.uk/~ajr/SA95/
- http://svr-www.eng.cam.ac.uk/~ajr/SA95/node78.html
-
- ITU Coding Standards
- Members of the ITU (International Telecommunications Union) can
- obtain copies of the Series G Recommendations (including
- G.711/721/723/728) from the ITU WWW site (http://www.itu.ch/)
- and from http://www.itu.ch/itudoc/itu-t/rec/g/g700-799.html.
-
- Jason Woodard's Speech Coding Page
- Introduction to speech coding plus information on a series of
- speech coding standards.
- http://www-mobile.ecs.soton.ac.uk/speech_codecs/index.html
-
- WWW searchable online-bibiliography for Phonetics and Speech
- Technology
- Over 8000 entries provided by Institut fur Phonetik at Johann
- Wolfgang Goethe-Universitat Frankfurt.
- http://www.uni-frankfurt.de/~ifb/bib_engl.html
-
- Ciaran McElroy's Speech Coding Page
- Introduction to many types of speech coding.
- http://wwwdsp.ucd.ie/speech/tutorial/speech_coding/speech_tut.h
- tml
-
- Examples of speech coding
-
- Nam Phamdo's Speech Coding Demonstration
- Examples of ADPCM, LD-CELP, CELP, LPC10 and CELP coding and
- coding over a noisy channel.
- http://admii.arl.mil/~fsbrn/phamdo/speech_demo.html
-
- Phil Karn's Digital/Analog Voice Demo
- Examples of several speech coding systems.
- http://www.qualcomm.com/people/pkarn/voicedemo/
-
-
- ___________________________________________________________________________
-
- Q3.3: Speech Compression / Coding Software
-
- The following speech compression software is described in the FAQ.
-
- * 32 kbps ADPCM
- * Castleton Network Systems - G.729 Voice Coder
- * CELP 3.2a & LPC-10
- * 8 Kbit/s CELP on the TMS320C5x family of DSP chips
- * CyberVoice
- * Rockwell's DigiTalk
- * File format conversion
- * G.711/721/723 Compression
- * G.728 LD-CELP vocoder
- * G.728 Compression
- * GSM 06.10 Compression
- * Lernout & Hauspie Speech Coding (5 products)
- * Lernout & Hauspie Speech Coding SDK
- * MPEG Audio
- * shorten - a lossless compressor for speech signals
- * Sipro Lab Telecom Inc. Coding
- * Sonarc: Digital Audio Compression
- * StarAudio Compressor/Player
- * TrueSpeech from DSP Group
- * U.S.F.S. 1016 CELP vocoder for DSP56001
- * ToolVox from Voxware
-
-
-
- 32 kbps ADPCM
-
- * Platform: SGI and Sun Sparcs
- * Description: 32 kbps ADPCM C-source code (G.721 compatibility is
- uncertain)
- * Contact: Jack Jansen
- * Availablity: http://www.cwi.nl/ftp/audio/adpcm.shar
-
-
-
- Castleton Network Systems - G.729 Voice Coder
-
- * Platform: TI TMS320C5x DSP
- * Description: G.729, also called CS-ACELP (Conjugate-Structure
- Algebraic Code Excited Linear Prediction), is a state-of-the-art
- voice compression ITU (International Telecommunications Union)
- standard that can be used in a wide range of applications
- including wireless communications, digital satellite systems,
- packetized speech and digital leased lines. G.729 provides 8000
- bits/s bandwidth for compressed speech at toll quality (equivalent
- to G.726 32 kbit/s ADPCM under clean channel condition). Also,
- G.729 has lower complexity and lower bit rate than G.728.
- The Castleton G.729 implementation provides a bit-exact
- implementation of the ITU standard on a single TI TMS320C5x DSP.
- The software is C callable and fully re-entrant, which allows easy
- interfacing and multi-channel capability. The encoder and decoder
- are fully independent, therefore, a DSP device can run a number of
- full-duplex or half-duplex channels. The coder and the decoder are
- able to operate under a real-time task switching kernel.
- * Cost and Availablity: Contact Castleton Network Systems.
- * Contact: Castleton Network Systems Corporation
- 350 Terry Fox Drive, Kanata, Ontario, Canada K2K 2W5
- Ph: 613-591-8786, Fax: 613-591-8783
- Email: inquire@castleton.com
- WWW: http://www.castleton.com/
-
-
-
- CELP 3.2a & LPC-10
-
- * Platform: Sun (the makefiles and source can be modified for other
- platforms)
- * Description: CELP is lossy compression technqiue. The US
- Department of Defences's Federal-Standard-1016 based 4800 bps code
- excited linear prediction voice coder version 3.2a (CELP 3.2a).
- Fortran and C simulation source codes.
- * Availability: By anonymous ftp from:
- ftp://ftp.super.org/pub/speech/celp_3.2a.tar.Z
- Or from the comp.speech ftp server
- ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/celp_3.2a.tar.Z
- ftp://svr-ftp.eng.cam.ac.uk/comp.speech/coding/celp_3.2a.tar.gz
- LPC-10 Fortran source code is also available:
- ftp://ftp.super.org/pub/speech/lpc10-1.0.tar.gz
- Here is a modified LPC-10 release that includes ANSI C source:
- http://www.arl.wustl.edu/~jaf/lpc/
- * Documentation: The following articles describe the
- Federal-Standard-1016 4.8-kbps CELP coder:
- + Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C.
- Welch, "The Federal Standard 1016 4800 bps CELP Voice Coder,"
- Digital Signal Processing, Academic Press, 1991, Vol. 1, No.
- 3, p. 145-155.
- + Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C.
- Welch, "The DoD 4.8 kbps Standard (Proposed Federal Standard
- 1016)," in Advances in Speech Coding, ed. Atal, Cuperman and
- Gersho, Kluwer Academic Publishers, 1991, Chapter 12, p.
- 121-133.
- The U.S. DoD's Federal-Standard-1015/NATO-STANAG-4198 based 2400
- bps linear prediction coder (LPC-10) was republished as a Federal
- Information Processing Standards Publication 137 (FIPS Pub 137).
- It is described in:
- + Thomas E. Tremain, "The Government Standard Linear Predictive
- Coding Algorithm: LPC-10," Speech Technology Magazine, April
- 1982, p. 40-49.
- There is also a section about FS-1015 in the book:
- + Panos E. Papamichalis, Practical Approaches to Speech Coding,
- Prentice-Hall, 1987.
- The voicing classifier used in the enhanced LPC-10 (LPC-10e) is
- described in:
- + Campbell, Joseph P., Jr. and T. E. Tremain, "Voiced/Unvoiced
- Classification of Speech with Applications to the U.S.
- Government LPC-10E Algorithm," Proceedings of the IEEE Intl.
- Conf. on Acoustics, Speech, and Signal Processing, 1986, p.
- 473-6.
- * Vendors:
- Realtime DSP code for FS-1015 and FS-1016 is sold by:
- + John DellaMorte, DSP Software Engineering
- 165 Middlesex Tpk, Suite 206, Bedford, MA 01730, USA
- Ph: 1-617-275-3733 Fax: 1-617-275-4323
- Email: dspse.bedford@channel1.com
- DSP Software Engineering's FS-1016 code can run on a DSP
- Research's Tiger 30 (a PC board with a TMS320C3x and analog
- interface suited to development work).
- + DSP Research
- 1095 E. Duane Ave, Sunnyvale, CA 94086, USA
- Ph: (408)773-1042 Fax: (408)736-3451
-
-
-
- 8 Kbit/s CELP on the TMS320C5x family of DSP chips
-
- * Description: For low bandwidth transmission of voice, compact
- voice storage for archival purposes, low-cost digital answering
- machines and efficient storage for voice mail. Features :
- + near toll quality at 8 Kb/s.
- + Variable rate option with 1 Kb/s silence encoding.
- + Implemented on a fixed-point processor for lower system cost.
- + Attractive licensing scheme.
- + Future availability of 4 Kb/s.
- + Custom rates possible.
- Capacity :
- + Two half-duplex or one full duplex channels on the 20 MIPS
- 'C5x (at 95% and 55% CPU utilization respectively).
- + Two full duplex channels on the 28.6 MIPS 'C5x (at 77% CPU
- utilization).
- + Requires 9 K-words program memory and 3 K-words data memory.
- + Decoding in real-time on a 486 class CPU.
- * Contact:
-
- CVI Inc.
- 443 Vienna Cres. North Vancouver, BC, Canada V7N 3B3
- Tel: (604) 987 1719 Fax: (604) 986 8139
- Email: cvi@extropia.wimsey.com
-
-
-
- CyberVoice
-
- * Description: Cybernetics InfoTech, Inc. offers the following
- products
- + Telephone voice compression at 1.2, 2.4, 4.8 and 6.0 kbit/s
- with good-communications-quality to near-toll-quality coded
- voice;
- + Wideband voice (7-kHz bandwidth) compression at 16 kbit/s
- with near-original-quality coded voice;
- + Internet Voice E-mail software with voice editing,
- high-quality low-data-rate voice compression, fast/slow voice
- playback, and more.
- * Availablity: C code and Windows .DLL for telephone voice
- compression and wideband voice compression are available for
- licensing.
- Real-time DSP codes are under development.
- Voice E-mail software is available for purchase and download from
- the CyberVoice home page.
- * Contact: Cybernetics InfoTech, Inc.
- 2 Professional Dr., #228, Gaithersburg, MD 20879
- WWW: http://www.cybit.com/
- E-mail: info@cybit.com
- Fax: 301-590-0359
-
-
-
- Rockwell's DigiTalk
-
- * Description: The DigiTalk coder operates at a sampling rate of
- 8KHz and transmits 223 bits of coded speech every 26ms, giving an
- overall bit rate of 8.577Kbps. The algorithm is based on
- analysis-by-synthesis predictive coding with vector-coded
- excitation, in which the excitation signal is optimized by
- minimizing the perceptually weighted error between the original
- and synthesized speech. More information and results of perceptual
- tests are available on the WWW.
- * Availablity: See the WWW page:
- http://www.nb.rockwell.com/ref/digitalk/
-
-
-
- File format conversion
-
- * Platform: SUN OS?
- * Description: Conversion utility able to encode and decode between
- the the following formats: G.723, G.721, A-law, u-law and linear.
- * Availability: By anonymous ftp from
-
- ftp://ftp.cwi.nl/pub/audio/ccitt-adpcm.tar.Z
-
-
-
- G.711/721/723 Compression
-
- * Description:
- + G.711 : CCITT u-law and A-law compression
- + G.721 : CCITT 32 kbps ADPCM coder
- + G.723 : CCITT 24 kbps and 40 kbps ADPCM coders
- * Availability: By email to itudoc@itu.ch, with
- GET ITU-3022
- as the *only* line in the body of the message.
- It is also available by anonymous ftp from:
-
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/G711_G
- 721_G723.tar.Z
-
-
-
- G.728 LD-CELP vocoder
-
- * Platform: Analog Devices ADSP-2171
- * Description: Real-time, full-duplex G.728 LD-CELP vocoder that
- runs on a single Analog Devices ADSP-2171. Source and object code
- available for a one-time license fee.
- * Contact:
-
- Cole Erskine
- Analogical Systems
- 299 California Avenue, Suite 120
- Palo Alto, CA 94306, USA
- Tel:(415) 323-3232 FAX:(415) 323-4222
- email: cole@analogical.com
-
-
-
- G.728 Compression
-
- * Description: G.728 low delay celp package written by Alex Zatsman
- of Analog Devices, Inc.
- * Availability: By anonymous ftp from
-
- ftp://dspsun.eas.asu.edu/pub/speech/ldcelp.tgz
-
-
-
- GSM 06.10 Compression
-
- * Platform: Unix; faster than real time on most Sun SPARCstations
- * Description: GSM 06.10 is a standardized lossy speech compression
- employed by most European wireless telephones. It uses RPE/LTP
- (residual pulse excitation/long term prediction) coding to
- compress frames of 160 13-bit samples (8 kHz sampling rate, i.e. a
- frame rate of 50 Hz) into 260 bits.
- * Contact: GSM 06.10 support and implementation
- _jutta@cs.tu-berlin.de_, cabo@cs.tu-berlin.de
- * Availability: The following configurations are available be
- anonymous ftp:
-
- gzip compression from Germany:
- ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/gsm-1.
- 0.7.tar.gz
-
- MS-DOS compression from Germany:
- ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/ddj/gs
- m-107.zip
-
- MS-DOS compression from USA:
- ftp://ftp.mv.com/pub/ddj/1194.12/gsm-105.zip
-
- * Misc: The WWW site is
-
- http://www.cs.tu-berlin.de/~jutta/toast.html
-
-
-
- Lernout & Hauspie Speech and Music Coding Product Range
-
- * Product name: L&H.smc650: 32kbps ADPCM Speech coding
- + Implementation of ADPCM 32 kbps based on CCITT G721 standard.
- + Estimated quality: 4.1 MOS (Mean Opinion Score)
- + Hardware Example: Analog Devices ADSP2101
- + Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear
- signal with up to 16 bits per sample; 8 kHz sampling rate
- * Product name: L&H.smc550: LD-CELP 16 kbps speech coding
- + Proprietary implementation of LD-CELP 16 kbps based on CCITT
- G728 standard.
- + Estimated quality: 4.0 MOS (Mean Opinion Score)
- + Hardware Example: Motorola 5600X
- + Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear
- signal with up to 16 bits per sample; 8 kHz sampling rate
- * Product name: L&H.smc450: 16-17.5 kbps speech coding
- + Estimated Quality: 3.9 MOS (Mean Opinion Score)
- + Hardware Examples: Analog Devices ADSP2101, Intel 486 DX2/66
- MHz
- + Input / Output Signal: A-Law or mu-Law PCM (64 kbps); Linear
- signal with up to 16 bits per sample; 8 kHz sampling rate.
- * Product name: L&H.smc350: 4.8-9.6 kbps speech coding
- + Proprietary CELP based software for compression rates of 4.8
- kbps to 9.6 kbps
- + Estimated Quality: 3.5 MOS (Mean Opinion Score)
- + Hardware Examples: AT&T DSP32C
- + Input / Output signal: A-Law or mu-Law PCM (64 kbps); Linear
- signal with up to 16 bits per sample; 8 kHz or 11.025kHz
- sampling rate.
- * Product name: L&H.smc250: 2.4 kbps speech coding
- + Combination of multi band excitation and code book excited
- linear prediction.
- + Estimated Quality: 3.0 MOS (Mean Opinion Score).
- + Hardware Examples: Intel 486 DX2/66 MHz, Analog Devices
- ADSP2101
- + Input signal: A-Law or mu-Law PCM (64 kbps); Linear signal
- with 12-15 bits per sample; 8 kHz sampling rate.
- + Output signal: A-Law or mu-Law PCM (64 kbps); Linear signal
- with 12-15 bits per sample; 8 kHz sampling rate.
- * See also: L&H Speech Coding SDK
- * More Information: On the WWW: http://www.lhs.com/coding.html
- * Cost: Unknown
- * Contact: Lernout and Hauspie Speech Products
- 20 Mall Road, 4th Floor
- Burlington, MA 01803, USA
- Ph: +1-617-238-0960, Fax: +1-617-238-0986
- Email: sales@lhs.com
- WWW: http://www.lhs.com/
-
-
-
- Lernout & Hauspie Speech Coding SDK
-
- * Description: Windows based software development kit for
- integrating speech coding technology with Windows based PC
- applications.
- * Requirements: IBM-compatible 486 DX/33 MHz + 2MB RAM + MS DOS 5.0
- + MS Windows 3.1 (or higher) + Sound Blaster compatible sound
- board.
- * See also: L&H Speech Coding Products
- * More Information: On the WWW: http://www.lhs.com/coding.html
- * Cost: Unknown
- * Contact: Lernout and Hauspie Speech Products
- 20 Mall Road, 4th Floor
- Burlington, MA 01803, USA
- Ph: +1-617-238-0960, Fax: +1-617-238-0986
- Email: sales@lhs.com
- WWW: http://www.lhs.com/
-
-
-
- MPEG Audio
-
- MPEG (Moving Pictures Experts Group) is a standard methods for
- compression and transmission of digital video and audio. Detailed FAQs
- and WWW sites are available for MPEG:
-
- MPEG Pointers and Resources
- http://www.mpeg.org/
-
- FAQ by Luigi: http://www.crs4.it/~luigi/MPEG/mpegfaq.html
-
- FAQ by Frank Gadegast
- http://www.powerweb.de/mpeg/mpegfaq/
-
- FAQ by by Chad Fogg
- http://www-plateau.cs.berkeley.edu/mpegfaq/MPEG-2-FAQ.html
-
- How to Install an MPEG Audio Player for your Web Navigator
- http://www.mpeg.org/index.html/MPEG-audio-player.html
-
- MPEG Audio Software on the WWW
-
- Audio and Music Applications for Silicon Graphics Systems
- Lists 4 MPEG audio players for SGI machines.
- http://reality.sgi.com/employees/cook/audio.apps/public.html
-
- MPEG-1 Audio Layer 3 encoder, decoder and FAQ
- From the Fraunhofer Institute
- http://www.iis.fhg.de/departs/amm/layer3/index.html
-
- MPEG-2 Audio FAQ from Philips
- http://www.keymodules.philips.com/MD/mpeg/faqmpeg2.htm
-
- MPEG-1 and MPEG-2 audio software
- Universitaet Hannover
- ftp://ftp.tnt.uni-hannover.de/pub/MPEG/audio/
-
- MPEG-1 Audio Layer 1 &2 encoder - decoder
- Internet Underground Music Archive (IUMA)
- ftp://ftp.iuma.com/audio_utils/converters/source/
-
- Buddy Software Library: MPEG-1 Audio Layer 3 encoder and
- player
- http://www.buddy.org/softlib.html
-
- MPEG-1 Audio Layer 1 & 2 decoder and verifier at CCETT
- ftp://ftp.ccett.fr/pub/mpeg/audio_new/
-
- MPEG-2 Audio encoder and decoder at CCETT
- ftp://ftp.ccett.fr/pub/mpeg/mpeg2/
-
- MPEG Audio - MetaSound
-
- * Platform: MS Windows/3.1 and Windows/95
- * Description: MetaSound is a partial MPEG-1 software decoder which
- is designed to work with hardware video decoders. It can reduce
- the hardware cost by eliminating the need for a hardware audio
- decoder. Currently, MetaSound has been successfully incorporated
- to work with three hardware video decoders. Features
- + Performance: For 486 DX4-100 machines or above, MetaSound can
- deliver FM quality (22 KHz) sound. For Pentium-90 or above
- machines, MetaSound requires 40% CPU bandwidth to deliver CD
- quality (44.1 KHz) sound.
- + Portability: it can take less than one month to port to new
- hardware video decoders.
- + CD standard supports including Video CD 1.0, Video CD 2.0,
- and CDI.
- + User interface with full set of functions: volume control,
- stop, pause, forward, backward, mute, resume, select the
- previous/next program track (Video CD 2.0), randomly select a
- program track (Video CD 2.0).
- + Error Recovery: can automatically skip error bitstreams.
- * Contact: Meta Media, Inc.
- F8, #10-1, Ho-Ping East Rd. Sec. 1, Taipei, Taiwan, R.O.C.
- Ph: 011-886-2-369-3330, Fax: 011-886-2-369-3331
- Email: mmedia@ms4.hinet.net.tw
-
-
-
- shorten - a lossless compressor for speech signals
-
- * Platform: UNIX/DOS
- * Description: A fast waveform coder suitable for a speech and music
- signals in a wide variety of file formats. The degree of
- compression is adjustable from lossless to three bits a sample.
- 16bit 16kHz speech generally attains 50% lossless compression and
- 16:3 compression of CDROM quality speech is obtainable with only
- minor audiable degredation.
- * Availability: Anonymous ftp - UNIX and DOS versions
-
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorte
- n.tar.gz
-
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorte
- n.tar.Z
-
- ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/shorte
- n.zip
-
-
-
- Sipro Lab Telecom Inc. Coding
-
- * Platform: Various processors
- * Description: Coding software for several International Standards
- plus two Proprietary standards.
- International Standards
- 1. PCS 1900 (a 13 kbps codec, established as a North American
- PCS standard)
- 2. Enhanced GSM (a 13 kbps codec)
- 3. G.723 (8 kbps codec established as a multi-purpose
- international standard)
- 4. G.729 (a dual-rate codec for the video phone market)
- 5. G.729 Annex A (8 kbps codec made for Digital Simultaneous
- Voice & Data transmission in the modem industry).
-
- Proprietary Standards
- 1. ACELP 8 v2.0 codec (flexible dual rate codec equipped with a
- VAD)
- 2. ACELP 4.8 codec
- * Contact: Sipro Lab Telecom Inc.
- 770, Chemin Lucerne, Ville Mont-Royal (Quebec), H3R 2H6 CANADA
- Ph: (514) 737-5874, Fax: (514) 737-2327
- E-mail: sales@sipro.com
- WWW: http://www.sipro.com/
-
-
-
- Sonarc: Digital Audio Compression
-
- * Platform: DOS and Windows
- * Description: Sonarc provides reversable, variable-rate compression
- of audio signals. Obtains compression ratio which averages about
- 2:1. Supports monaural and stereo files, 8-bit and 16-bit files,
- and WAVE and VOC formats.
- * Availablity: Shareware by Richard P. Sprague
- Speech Compression
- P.O. Box 1785, Wilsonville, OR, 97070-1785, USA
- Ph: (503) 263-3102
- Email: 76635.3652@compuserve.com
-
-
-
- StarAudio Compressor/Player
-
- * Platform: Win95
- * Description: Using a time-domain process delivers lossless
- decompressed data. Processes any source of .wav file format, high
- quality 16-bit audio data at any sampling rate. Requires no
- special hardware and decompression speed is real-time on most
- 486's and on any Pentium. The higher the sampling rate the higher
- the compression ratio; minimum compression of 4:1 for 11k data,
- and usually exceeding 7:1 for 44k data. Full bandwidth of signal
- is preserved with default compression options. Compression options
- allow increase of compression ratio further with a slight trade
- off in the reduction of the output quality. A decompression
- library is available for application development.
- * Demo: Download the shareware version of the program from the STR
- WWW site.
- * Misc: A technical paper is available in Word 6.0 format:
- ftp://ftp.speechtech.com/pub/speechtech/docs/audocw60.exe
- * Contact: Speech Technology Research Ltd.,
- Suite B - 1623 McKenzie Avenue, Victoria, B.C. V8N 1A6, Canada
- Ph: +1-250-477-0544
- Email: products@speechtech.com
- WWW: http://www.speechtech.com/home/speechtech/
-
-
-
- TrueSpeech from DSP Group
-
- * Description: TrueSpeech is a family of speech compression and
- decompression algorithms and software. It is designed for personal
- computers and personal communications devices. With the high
- compression ratios ranging from 15:1 to 27:1, TrueSpeech improves
- the storage and communications transmission of digital voice
- information and can be used in the integration of personal
- computers and telephones. TrueSpeech can be utilized in many
- products and applications such as:
- + Multimedia PCs
- + Sound cards and modems
- + Computer/telephony and teleconferencing
- + Voice mail systems and PBX systems
- + Wireless/cellular applications
- + Personal digital assistants
- + Games, Education
- + Video/cable and on-line services
- The TrueSpeech encoder is available for free in the Sound System
- of Windows 95 and Windows NT. The DSPG WWW pages have information
- on how to add TrueSpeech capability to your WWW pages.
- * Contact: DSP Group, Inc.
- 3120 Scott Boulevard, Santa Clara, CA 95054-3317, USA
- Phone: (408) 986-4300 Fax: (408) 986-4323
- Email: Webster@dspg.com
- WWW: http://www.dspg.com/index.html
-
-
-
- U.S.F.S. 1016 CELP vocoder for DSP56001
-
- * Platform: DSP56001
- * Description: Real-time U.S.F.S. 1016 CELP vocoder that runs on a
- single 27MHz Motorola DSP56001. Free demo software available for
- PC-56 and PC-56D. Source and object code available for a one-time
- license fee.
- * Contact:
-
- Cole Erskine
- Analogical Systems
- 299 California Avenue, Suite 120
- Palo Alto, CA 94306, USA
- Tel:(415) 323-3232 FAX:(415) 323-4222
- Email: cole@analogical.com
-
-
-
- ToolVox from Voxware
-
- * Platform: Windows and soon available on Mac (in Beta now) and Unix
- * Description: ToolVox is a proprietary frequency domain speech
- coder. 11 KHz speech is coded to an average rate of between 5,000
- bits per second and 9,000 bps. Real-time compression algorithms
- available for 2,400 bps. 22 KHz playback, as well as a ultra low
- bit rate 8 KHz codec are coming soon. On playback, the time scale
- can be changed by a 5x factor, pitch can be modified over a 3
- octave range, and vocal personality can be modified using a
- tranformation function called VoiceFonts(tm).
- * Misc 1: A SDK for Windows is available.
- * Misc 2: Demo software is available from the Voxware Inc WWW page:
- http://www.voxware.com/
- * Price: Basic toolkit is $895 US. OEM and mass distribution
- licenses are separate. Ordering information is provided on the
- Voxware WWW server.
- * Contact:
-
- Voxware, Inc.
- Ph: (609) 497-1212 Fax: (609) 497-2490
- Sale information: sales@voxware.com
- WWW: http://www.voxware.com/
-
-
- ___________________________________________________________________________
-
- Natural Language Processing
-
- comp.speech FAQ Section 4
-
- There is now a newsgroup specifically for Natural Language Processing;
- comp.ai.nat-lang. A FAQ posting is available for the group:
-
- ftp://rtfm.mit.edu/pub/usenet/comp.ai.nat-lang/Natural_Language
- _Processing_FAQ
-
- There is also a lot of useful information on Natural Language
- Processing in the comp.ai FAQ. That FAQ lists available software and
- useful references. It includes a substantial list of software,
- documentation and other info available by ftp.
-
- The FAQ has information on the following:
-
- * Q4.1: NLP References and Books
- * Q4.2: NLP Software
-
-
- ___________________________________________________________________________
-
- Q4.1: NLP References and Books
-
- Take a look at the FAQ for the "comp.ai" newsgroup as it also includes
- some useful references.
-
- * James Allen: Natural Language Understanding, (Benjamin/Cummings
- Series in Computer Science) Menlo Park: Benjamin/Cummings
- Publishing Company, 1987.
- + This book consists of four parts: syntactic processing,
- semantic interpretation, context and world knowledge, and
- response generation.
- * G. Gazdar and C. Mellish, Natural Language Processing in Prolog,
- Addison Wesley, 1989
- * G. Gazdar and C. Mellish, Natural Language Processing in Lisp,
- Addison Wesley, 1989
- * G. Gazdar and C. Mellish, Natural Language Processing in Pop11,
- Addison Wesley, 1989
- + Emphasis on parsing, especially unification-based parsing,
- lots of details on the lexicon, feature propagation, etc.
- Fair coverage of semantic interpretation, inference in
- natural language processing, and pragmatics; much less
- extensive than in Allen's book, but more formal. There are
- three versions, one for each programming language listed
- above, with complete code.
- * Shapiro, Stuart C.: Encyclopedia of Artificial Intelligence Vol.1
- and 2. New York: John Wiley & Sons, 1990.
- + There are articles on the different areas of natural language
- processing which also give additional references.
- * Paris, Ce'cile L.; Swartout, William R.; Mann, William C.: Natural
- Language Generation in Artificial Intelligence and Computational
- Linguistics. Boston: Kluwer Academic Publishers, 1991.
- + The book describes the most current research developments in
- natural language generation and all aspects of the generation
- process are discussed. The book is comprised of three
- sections: one on text planning, one on lexical choice, and
- one on grammar.
- * Readings in Natural Language Processing, ed by B. Grosz, K. Sparck
- Jones and B. Webber, Morgan Kaufmann, 1986
- + A collection of classic papers on Natural Language
- Processing. Fairly complete at the time the book came out
- (1986) but now seriously out of date. Still useful for ATN's,
- etc.
- * Klaus K. Obermeier, Natural Language Processing Technologies in
- Artificial Intelligence: The Science and Industry Perspective,
- Ellis Horwood Ltd, John Wiley & Sons, Chichester, England, 1989.
-
- The following are extensive bibliographies related to NLP:
-
- * Computational Parsing : Syntactic Analysis, Semantic Analysis,
- Semantic Interpretation, Parsing Algorithms, Parsing Strategies :
- BIBLIOGRAPHY, by Conrad F. Sabourin 1994, 2 volumes, 1029p, ISBN
- 2-921173-02-6, INFOLINGUA inc., P.O. Box 187 Snowdon, Montreal,
- H3X 3T4, Canada.
- * Computational Text Understanding : Natural Language Programming,
- Argument Analysis : BIBLIOGRAPHY, by Conrad F. Sabourin 1994,
- 657p, ISBN 2-921173-06-9, INFOLINGUA inc., P.O. Box 187 Snowdon,
- Montreal, H3X 3T4, Canada.
- See also: http://gomer.mlink.net/infolingua.html
- * Computational Text Generation : Generation from data or Linguistic
- Structure, Text Planning, Sentence Generation, Explanation
- Generation : BIBLIOGRAPHY, by Conrad F. Sabourin with a survey
- article by Mark T. Maybury 1994, 649p, ISBN 2-921173-07-7,
- INFOLINGUA inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
- See also: http://gomer.mlink.net/infolingua.html
- * Natural Language Processing : Interfaces to Databases, to Expert
- Systems, to Robots, to Operating Systems, and to
- Question-Answering Systems : BIBLIOGRAPHY, by Conrad F. Sabourin,
- 1994, 2 volumes, 847p, ISBN 2-921173-08-5 INFOLINGUA inc., P.O.
- Box 187 Snowdon, Montreal, H3X 3T4, Canada
- See also: http://gomer.mlink.net/infolingua.html
-
- Journals
-
- The major journals of the field are
-
- * Computational Linguistics and _Cognitive Science_ for the
- artificial intelligence aspects,
- * Cognition for the psychological aspects,
- * Language and _Linguistics and Philosophy_ and Linguistic Inquiry
- for the linguistic aspects.
- * Artificial Intelligence occasionally has papers on natural
- language processing.
-
- Conferences
-
- The major NLP conferences are
-
- * ACL: held annually
- * COLING: held biannually
-
- Most AI conferences have a NLP track; AAAI, ECAI, IJCAI and the
- Cognitive Science Society conferences usually interesting for NLP.
- CUNY is an important psycholinguistic conference. Other conferences
- include NELS, the conference of the Chicago Linguistic Society (CLS),
- WCCFL, LSA, the Amsterdam Colloquium, and SALT.
-
-
- ___________________________________________________________________________
-
- Q4.2: NLP Software
-
- Natural Language Software Registry (NLSR) - NLP Tools
-
- * The Natural Language Software Registry is available from the
- German Research Institute for Artificial Intelligence (DFKI) in
- Saarbrucken. Its purpose is to facilitate the exchange and
- evaluation of natural language processing software within the
- research community. To this end, the NLSR is cataloging natural
- language software projects, both commercial and non- commercial.
- The new updated and enlarged version contains more than 100
- descriptions of natural processing software. Registry listings
- include:
- + speech signal processors, such as the Computerized Speech Lab
- (Kay Elemetrics)
- + morphological analyzers, such as PC-KIMMO (Summer Institute
- for Linguistics)
- + parsers, such as Alveytools (University of Edinburgh)
- + semantic and pragmatic analyzer, such as NLL (University of
- the Saarland, Germany)
- + generation programs, such as FUF (Ben Gurion University of
- the Negev)
- + knowledge representation systems, such as Rhet (University of
- Rochester)
- + multicomponent systems, such as ELU (ISSCO), PENMAN (ISI),
- Pundit (UNISYS), SNePS (SUNY Buffalo),
- + NLP-Tools, such as GULP (University of Georgia) or Linguist
- (Kansai Research Laboratory)
- + applications programs (misc.)
- * If you have developed a piece of software for natural language
- processing that other researchers might find useful, you can
- include it by returning the questionnaire available from the
- sources below.
- * ftp://ftp.dfki.uni-sb.de/pub/registry
- * e-mail: registry@dfki.uni-sb.de
- * Natural Language Software Registry
- Deutsches Forschungsinstitut fuer Kuenstliche Intelligenz (DFKI)
- Stuhlsatzenhausweg 3
- D-66123 Saarbruecken
- Germany
- * Other ftp sites are
-
- ftp://crlftp.nmsu.edu/pub/non-lexical/NL_Software_Registy
-
- ftp://dri.cornell.edu/pub/Natural_Language_Software_Registry
-
- Part of Speech Tagger
-
- * Description: A rule-based part of speech tagger developed by Eric
- Brill.
- * Availability: The tagger software, about 10 descriptive papers and
- related data are available by anonymous ftp from
- ftp://ftp.cs.jhu.edu/pub/brill/
-
-
- ___________________________________________________________________________
-
- Copyright (c) 1993-6 by Andrew Hunt, all rights reserved.
- This FAQ may be posted to any USENET newsgroup, on-line service, or BBS as
- long as it is posted in its entirety and includes this copyright statement.
- This FAQ may not be distributed for financial gain.
- This FAQ may not be included in any collections or compilations
- without express permission from the author.
-
-
-
- ---
-
- Andrew Hunt
- Speech Applications Group
- Sun Microsystems Laboratories Ph: (978) 442-2681
- 2 Elizabeth Drive, MS UCHL03-207 Fax: (978) 250-5067
- Chelmsford, MA 01824, USA Email: andrew.hunt@east.sun.com
-