home *** CD-ROM | disk | FTP | other *** search
- VOICE DIGITIZATION AND REPRODUCTION ON THE
- IBM PC/XT AND PC/AT BUILT-IN SPEAKER
- --------------------------------------------
-
- Alan D. Jones July 1988
-
-
- The speaker on the PC and its associated driver circuitry is quite
- simple and crude, having been designed primarily for creating single
- square-wave tones of various audio frequencies. This speaker is typically
- driven by a pair of transistors used as current amplifier which is in turn
- driven directly by the output of a TTL gate. This results in only two
- possibilities of voltage across the voice coil: 0 volts and 5 volts. Any
- sound to be reproduced by this system must be reduced to an approximation
- in the form of a stream of constant-amplitude, variable-width rectangular
- pulses.
- Examination of a speech waveform on an oscilloscope display quickly
- tells us that it is not going to be possible to even remotely mimic this
- waveform under the above restrictions. Much of the information contained
- in the waveform is in the form of amplitude variations, and this is the
- one attribute we cannot reproduce. It is initially tempting to try to
- use the technique of the "class D" amplifier to create the waveform, using
- high-speed pulse width modulation and depending on the mechanical
- characteristics of the speaker and those of the human ear to provide the
- missing low-pass filtering. Assuming the sampling rate to be 8 KHz (based
- on the Nyquist criterion) and, to conserve memory, assuming the samples
- to contain only 4 bits of amplitude information (16 levels), we can see
- that data accumulates at a rate of 4k bytes per second, which is certainly
- acceptable. The problem comes when we try to play back the sound. Pulses
- occur at intervals of 125 microseconds, which doesn't seem too bad, but
- since each pulse can have 16 possible widths, it is necessary to time the
- pulses with a resolution of well under 8 microseconds. This is only a
- couple of instruction times on a 4.77 MHz XT, and even on a fast 80386
- it doesn't give the CPU much time between bits to shift bits, read and
- increment a pointer, check the pointer to see if it's done yet, etc., not
- to mention the difficulty of servicing unrelated interrupts.
- The search for simpler (but still usable) and less CPU-intensive
- methods of reproducing speech leads to the question of what information
- in the waveform we can discard without an unacceptable loss of
- intelligibility. My experiments with running speech signals through
- a graphic equalizer revealed that the lower-frequency components, those
- which are most visible to the eye on the oscilloscope, are actually of
- minimal importance in understanding speech. This is also demonstrated by
- the fact that a whisper is just as understandable as normal speech, but
- does not make use of vibrating vocal chords, which are the primary source
- of low-frequency components in the voice.
- The schematic created by printing the file SCHEMATC.PRT arose partly
- from the above observations and partly from trial-and-error. The circuit
- consists of two stages of voltage amplification with some high-pass
- filtering built into the coupling capacitors, followed by a differentiator.
- The output of the differentiator is fed to a voltage comparator, thus
- producing an output which has approximately the following relationship
- to the input from the microphone: If the derivative of the speech waveform
- if positive, then the output is logic zero; If the derivative of the speech
- waveform is negative, then the output is logic one. The transition timing
- at the output is entirely analog in nature; there is no synchronizing
- clock signal anywhere in the circuit.
- If the output of this circuit is connected directly to a speaker, the
- resulting sound will still be an understandable version of the input.
- Since the output consists of nothing but a digital bit stream, the job
- of the computer becomes that of simply recording and accurately reproducing
- this bit stream.
- The trimpot at the input of amplifier U3 is used to set the DC idle
- voltage output from the differentiator to somewhere near the threshold
- of comparator U4. There will be a considerable amount of noise at the output
- of U3, originating at the microphone and within the input circuitry of U1,
- and highly amplified by U1 and U2. The trimpot should be adjusted so that
- the comparator threshold is just outside the normal excursion of the noise
- signal ("off to one side"), otherwise "silence" at the microphone will
- become, at the speaker output from the computer, a loud hiss with a strong
- component at half the sampling frequency.
- I used LF356's for U1, U2, and U3, and an LM393 for U4. Everything is
- powered by +12 and ground. All amplifiers should have power supply bypass
- capacitors (not shown). The microphone is a 600 ohm dynamic type. The 12
- volt power supply should be quiet and well-regulated; the one in the PC is
- too noisy unless you use heavy filtering.
- The two programs, RECORD and PLAY, are used as follows: Attach the
- circuit to the CTS input on one of the PC's COM ports. Then type:
- RECORD <number> <filename> where <number> is the COM port number
- and <filename> is the name of the disk file to contain the voice data.
- RECORD will respond with "Press a key to start and stop." Press the space
- bar and start talking. Press the space bar again to end recording and write
- the data to disk. Play it back with PLAY <filename>. The sampling rate is
- about 16.5k bits per second. This means that about 30 seconds of voice will
- make a 64k disk file. This is a simple program; it runs out of steam at 64k.
- The programs both operate by reprogramming the 8253 time chip to produce
- hardware interrupts at the 16.5 KHz rate. The interrupt service routine then
- manipulates the NAND gate driving the speaker based on bits read from the
- file. The 16.5 Khz rate was chosen by trial-and-error; this is the audible
- "point of diminishing returns", where a further increase in sampling rate
- didn't produce enough of an improvement to warrant the increased memory
- usage.
- This technique is somewhat limited in its usefulness. It necessitates
- the writing of a "badly behaved" program which not only reprograms the timer
- chip but also totally hogs the CPU for the duration of the voice output.
- Nevertheless, it demonstrates a few interesting things about how humans hear
- speech. I first developed this circuit over a year ago as a rebuttal to
- someone who said "it couldn't be done". Not only can it be done, it is
- actually quite simple. Certainly the curcuit could be improved, at the
- possible expense of increased complexity. I'm waiting to hear from some of
- you. If anyone has questions, especially about my sloppy code, I check
- for messages on CIS every three or four days.
-
- - Alan
-
- 74030,554
-