NetNews Usenet Archive 1992 #20

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #20 / NN_1992_20.iso / spool / comp / dsp / 2131 < prev next >

Wrap

Internet Message Format | 1992-09-08 | 2.4 KB

Xref: sparky comp.dsp:2131 comp.compression.research:156 Path: sparky!uunet!wupost!sdd.hp.com!swrinde!network.ucsd.edu!qualcom.qualcomm.com!qualcom!rdippold From: rdippold@qualcom.qualcomm.com (Ron Dippold) Newsgroups: comp.dsp,comp.compression.research Subject: Re: Looking for telephone quality audio compression Message-ID: <rdippold.716077895@qualcom> Date: 9 Sep 92 22:31:35 GMT References: <BuBn7u.BFu@news.cso.uiuc.edu> Sender: news@qualcomm.com Organization: Qualcomm, Inc., San Diego, CA Lines: 38 Nntp-Posting-Host: qualcom.qualcomm.com ja51359@uxa.cso.uiuc.edu (axelrod) writes: > I'm looking for a audio compression algorithm that will result >in telephone quality reproduction. I.E. 4Khz bandwidth, limited dynamic >range, average S/N ratio. > I'm already familiar with using delta-fibonacci, delta-huffman >techniques, but I'm looking for a more lossy algorithm that will give >better compression results, more like on the order of 8:1 with 8-bit >samples. > How is the quality of CELP compression? I heard voices end up >sounding robotic. I'd like something that sounds natural. Our version of CELP, QCELP, sounds quite decent. If things aren't tuned just right, voices can get a "sharpness" to them. To my ear it sounds superior to standard telephone, and those I've called have been unable to tell when I'm calling with the desk phone and with the cellular phone unless we introduced plenty of noise into the system (at which time the voice starts sounding somewhat "bubbly" as the noise overwhelms our error correction). We output 192 bits per 20 millisecond frame which works out to 1200 bytes per second, or 4.3 megabytes per hour of speech. In addition, we do voice activity detection and can produce half, quarter, and eighth rate frames. The voice activity factor of standard speech works out to about 0.6 with this method, which means that the resulting data is only 60% of the size of that where we force it to stay in full rate mode, which gives about 2.6 megs for an hour of speech, or 720 bytes per second. Given your sampling rate of 8000 Hz with 8 bit samples, that would be 28.8 megs for an hour of speech, so we're doing around 7:1 without even bothering with voice activity, about 11:1 with it, including error correction. We're doing all this in an ASIC, but it demonstrates that it's possible to get what you want with a version of CELP. At least it might be worth looking into. -- I never made a mistake in my life. I thought I did once, but I was wrong.