home *** CD-ROM | disk | FTP | other *** search
-
- SOFTWARE-BASED DIGITAL AUDIO ON PCs
- by David T. Chappell
-
- Copyright (C) 1991 ACM. All Rights Reserved
-
- This article originally appeared in the Proceedings of the 1991 ACM
- Computer Science Conference, March 1991. Copying is by permission of
- the Association for Computing Machinery.
-
-
- ABSTRACT
-
- Digital audio techniques were investigated with special
- concentration on computer applications. Experimentation showed that
- the Intel 8253 programmable interval timer, found on all IBM PCs and
- compatibles, can output digitized sound. The relationship between
- pulse amplitude modulation and pulse width modulation signals was
- found to be significant to this process. Feeding digitized sound
- data to the chip results in an inherent transfer from pulse amplitude
- modulation to pulse width modulation encoding. The relationship in
- signal modulation allows IBM PCs to play digitized sound without the
- use of extra hardware.
-
- Biography:
- David T. Chappell is a senior in the computer science curriculum
- at North Carolina State University. In addition, he participates in
- the co-op program and works for IBM. His interests are in the areas
- of advanced input/output, technological research, and scientific
- applications. David plans to attend graduate school in computer
- science or a related field.
-
-
- INTRODUCTION
- In the past few years, the computer industry has been slowly
- gaining interest in computer-generated speech and sound. In this
- area, the Apple Macintosh and Commodore Amiga provide built-in sound
- control hardware so that their machines can play digitized recordings
- with little effort from the programmer. IBM, however, has chosen not
- to include advanced sound capabilities in its line of personal
- computers. The market has shown that relatively few people will buy
- speech or sound add-ons from either IBM or third parties. Until such
- hardware is standardized, sound software will encounter great
- difficulty in gaining acceptance. It is in this regard, however, that
- mathematics, engineering, and software can come to the rescue: it is
- possible for a PC to play good quality sound without additional
- hardware.
-
-
- BACKGROUND
- Sound is transmitted through the air as a longitudinal wave. The
- source of the sound compresses the air in one area and this air
- compresses the air next to it while it moves back to its original
- position. As each air molecule is displaced from and returns to its
- normal position, the wave travels through the air. The movement that
- results from these repeated compressions and rarefactions is called
- the propagation of the wave [1].
- It is often convenient to use graphs to visually represent sound.
- Most pictures show a graph of the wave amplitude vs. time, where the
- amplitude represents the displacement of molecules from their original
- positions. [See figure 1.] Regular waves, such as sine waves, need
- not be depicted in such a manner but can be represented by just a
- frequency (or wavelength), amplitude (volume), and duration. For
- example, the SOUND statement in Microsoft BASIC has the parameters of
- frequency and duration. (Volume is not needed because the PC has only
- one volume.) The irregular waves that make up most of the sound we
- hear are more complicated and need to be represented by giving
- specific details on the amplitude at each point in time.
- Digitized sounds consist of the numerical values for the
- amplitude at regular time intervals. [See figure 2.] Along with the
- amplitude data, the number of samples taken per second must be
- recorded. By reproducing the amplitude changes at the original rate,
- the sound can then be played back. Much research has gone into
- digitized sound, and it is now possible to produce digitized
- recordings which, to the human ear, are identical to their analog
- counterparts.
- Once recorded, digital signals can be stored in a variety of
- formats. Pulse amplitude modulation (PAM) is the standard method of
- representing analog data. [See figure 3.] In this method, each piece
- of data represents the amplitude at one instant in time. Pulse width
- modulation (PWM) treats each piece of information as the duration of a
- pulse which starts at a regular frequency. [See figure 4.] In pulse
- code modulation (PCM), each bit records whether the amplitude is high
- or low at each point in time. [See figure 5.] PCM is the most common
- method in digital audio recording and is used by audio compact discs
- to achieve high-quality sound. Pulse position modulation (PPM)
- records the position of a brief pulse by giving the time duration
- before the pulse occurs [1]. [See figure 6.]
- Several brands of personal computers include dedicated sound
- chips. The Macintosh, Amiga, Atari ST, and other computers can easily
- produce high-quality sound by using dedicated hardware. On these
- computers, many programs are enhanced by the addition of sound and
- speech. Likewise, IBM and a multitude of other companies have
- produced boards that allow PCs to record and playback digitized sound.
- A lack of standards, however, limits the use of these PC boards, and
- few programs use them.
- Over the years, several attempts have been made to play digitized
- sound on PCs without special hardware. A number of these have
- appeared in the public domain, but few are coherent. Commercial
- software has had greater success, but most of these programs produce
- rough speech. With the recent rise of interest in audio and video, a
- few developers have even produced good, intelligible speech and
- sounds. The work presented here rivals that of the best public domain
- and commercial successes.
-
-
- MATERIALS AND METHODS
- The Intel 8253 programmable interval timer, found on all IBM PCs
- and compatibles, is a flexible counter chip. It has three 16-bit
- channels. Each channel produces an output signal based on an input
- signal and a programmed 16-bit number. The chip's six modes can
- produce varying types of output. Table 1 summarizes the six modes.
- Refer to Rosch [2] or Sargent and Shoemaker [3] for more details.
-
-
-
-
- Table 1: Intel 8253 Operating Modes
-
- 0 - interrupt on terminal count
- 1 - programmable one-shot
- 2 - rate generator
- 3 - square-wave generator
- 4 - software-triggered strobe
- 5 - hardware-triggered strobe
-
- On IBM PCs, the 8253's first channel increments the time-of-day
- clock, the second channel refreshes the DRAMs, and the third channel
- sends sound to the speaker. The timer/counter of AT class machines is
- based on the Intel 8254-2 chip and is functionally equivalent to the
- 8253. The PS/2 line has an 8253 that is used the same way, except
- that a separate chip refreshes memory, and the second channel of the
- 8253 is used for diagnostics or is unassigned [2].
- The 8253 can only give two possible output states: 0 and 1.
- Possibly because of this limitation, the chip's use for sound
- production in PCs has typically been limited to being a square-wave
- generator (mode 2). When functioning as a square-wave generator, the
- chip's output is equal to its input frequency (1.193 MHz on PCs)
- divided by a 16-bit number that is input to the chip by the
- programmer. The output is a square wave whose high and low periods
- are equal. The range of the chip falls between 18.2 hertz and 1.193
- megahertz [4]. Mode 2 is used by channel 0 for the time-of-day clock
- and by channel 2 to produce the beeps and whistles found in many DOS
- programs.
- Despite the two-state output limitation, the 8253 can also play
- digitized sound. When put in mode 0, the output defaults to high.
- When a 16-bit number is then programmed into the chip, the output goes
- low for the duration of the specified number of input pulses, after
- which the output returns high. The net result is that the output wave
- is in the form of PWM. [See figure 7.]
- To program PWM sound output on the 8253, several steps must be
- taken. First, the programmable peripheral interface (PPI) chip must
- be initialized to the desired mode. Then the 8253 must be initialized
- and data sent to it. Table 2 summarizes chip ports.
-
- Table 2: Chip I/O Port Addresses
-
- Port Dec Hex
- PPI chip 97 61
- 8253 Channel 0 64 40
- 8253 Channel 1 65 41
- 8253 Channel 2 66 42
- 8253 Control Word 67 43
-
- To allow the 8253 to control the speaker, the PPI must be set
- correctly. The two lowest bits of port 97 must be turned on. The
- other bits of this port should remain untouched since they are used
- for other purposes [4].
- Port 67 is the control word register which initializes the 8253.
- Each counter is initialized by sending this port one control byte.
- Table 3 shows the meanings of the bits in the control word.
-
- Table 3: Meaning of 8253 Control Word Register
-
- bit 0 = 0, count in binary
- = 1, count in Binary Coded Decimal
- bits 1-3 = mode number (0 to 5 in binary)
- bits 4,5 = 00, latch current count for reading
- = 01, read/load low byte
- = 10, read/load high byte
- = 11, read/load low byte, then high byte
- bits 6,7 = counter number (0 to 2 in binary)
-
- (Bit 0 is least significant; 7 is most significant)
-
- When reading both bytes of the 16-bit value, a latch command prevents
- the count from changing between reading the high byte and the low
- byte. Latching is not needed when reading only a single byte. For
- example to set the chip to generate musical tones in mode 3 the
- control word is 182 (B6 hex). For digital audio via mode 0, use 176
- (B0 hex).
- Ports 64, 65, and 66 are used to read and write to timers 0, 1,
- and 2 respectively. Data sent to these ports becomes the 16-bit
- number used to affect output. If only one byte is sent, the other
- byte retains its previous value.
- Listing 1 shows the general algorithm for 8-bit digital audio
- output with the 8253. 8-bit quality is achieved by leaving the high
- byte constant and sending data only to the low byte.
-
-
- Listing 1: Algorithm for Digital Audio
-
- Load Digital Audio Data;
- value = InPort(61h); -- Initialize PPI
- OutPort(61h, value OR 3);
- OutPort(43h,B0h); -- Initialize 8253
- OutPort(41h,00h);
- OutPort(41h,00h);
- OutPort(43h,90h);
- loop until end of data -- Play sound
- OutPort(41h,Data);
- Wait Until Data Passes;
- OutPort(43h,B6h); -- Restore 8253
-
- Listing 2 gives an example program written for Turbo C. In the
- sample program, the user must specify whether the input data is
- signed. Some forms of digital audio storage, including PCM,
- inherently use unsigned variables. Other forms, including PWM and
- PAM, can be stored as signed variables, and notably the Amiga
- microcomputer stores digitized sound data on a scale from -128 to 127.
- Since the playback method using the 8253 must use unsigned data, a
- scaling factor of 128 must be added to all signed data [5].
-
-
- Listing 2: Turbo C Code for Digital Audio
-
- #include <conio.h>
- #include <dos.h>
- #include <io.h>
- #include <stdio.h>
- #include <stdlib.h>
-
- /* SOUND.C
-
- Author: David Chappell
- Version: 1.46d
- Date: 24 June 1990
- Method: 8253 PWM method
- */
-
- FILE *soundfile; /* input data file */
- unsigned long size; /* size of input file */
- int wait, /* time to wait between sending samples out */
- unsigned char offset, /* change signed samples to unsigned */
- vol1, vol2; /* adjusts range of data */
-
- void error(char message[])
- /* Purpose: handles errors */
- {
- fprintf(stderr,"\nERROR: %s\n",message);
- exit(-1);
- }
-
- void playfile(void)
- /* Purpose: loads file and plays digitized sound */
- {
- int pause;
- unsigned int count, temp;
- unsigned char min, max, data;
- char *inputbuffer;
-
- if ((inputbuffer=(char*) calloc(size,sizeof(char))) == NULL)
- error("Not enough memory to load file");
- fread(inputbuffer, size, 1, soundfile);
- /* scale data */
- min=255; max=0;
- for (count = 0; count < size; count++) {
- data = *(inputbuffer + count)+offset;
- if (data<min)
- min=data;
- if (data>max)
- max=data;
- }
- vol1 = 64;
- vol2 = max-min+1; /* scale from 0 to 64 */
- offset -= min; /* move lowest point to zero */
- disable(); /* disable interrupts */
- for (count = 0; count < size; count++) {
- data = *(inputbuffer + count) + offset;
- temp = data * vol1 / vol2;
- data = temp + 1;
- output(66,data);
- for (pause = 0; pause < wait; pause++);
- }
- enable(); /* enable interrupts */
- }
-
- void startspeaker(void)
- /* Purpose: intitialize speaker for output */
- {
- outp(97,inp(97) | 3); /* set PPI */
- outp(67,176); /* send initial data to timer */
- outp(66,00);
- outp(66,00);
- outp(67,144); /* prepare timer chip to receive data */
- }
-
- void openfile(void)
- /* Purpose: gets information from user and opens input file */
- {
- char choice; /* key hit by user */
-
- clrscr(); /* clear screen */
- puts("What file do you want to hear?");
- if ((soundfile = fopen(gets(NULL),"rb")) == NULL)
- error("Unable to open sound file");
- fseek(soundfile,0,SEEK_SET);
- size = filelength(fileno(soundfile));
- printf("\nFile size = %lu bytes\n\n",size);
- printf("What delay time do you want in FOR counter? ");
- scanf("%d",&wait);
- printf("Is the data signed? ");
- choice=getche();
- if ((choice=='Y') || (choice=='y'))
- offset=128;
- }
-
- void stopsound(void)
- /* Purpose: resets speaker to stop sound */
- {
- outp(67,182); /* restore timer to mode 3 */
- fclose(soundfile);
- }
-
- void main(void)
- {
- openfile();
- startspeaker();
- playfile();
- stopsound();
- }
-
- RESULTS
- The method described thus far has several limitations when put
- into practice on PCs. The 16-bit quality of the chip reduces to 7
- bits at most sample rates. Also, a background tone is produced along
- with the desired sound because of the use of PWM. Both of these
- difficulties arise from timing problems.
- The method described thus far has the capability of yielding
- high-quality 16-bit sound. The code given, however, can only produce
- approximately 7-bit sound. Although the 8253 has the ability to play
- 16-bit data, the timing limitations of the PC restrict the length of
- each pulse. In order to produce sound at the rate of about 8-13 kHz,
- only about six or seven bits of data are processed before the next
- piece of data must begin output. At a slower sample rate of about 4-7
- kHz, seven to eight bits of accuracy can be achieved. A higher input
- frequency would resolve this difficulty; however, the frequency can
- not be changed in PCs but could be modified in other applications of
- the 8253.
- The maximum data size (volume) possible can be calculated
- mathematically. The 8253 input rate divided by the output sample rate
- yields the number of time periods that pass before the next sample
- begins play:
-
- Maximum value = 1.193 MHz / sample rate
- For example, the maximum volume for an 8 kHz sample is 147. The 1.193
- MHz input frequency that feeds the 8253 limits the chips sound
- capabilities.
- As an annoying side-effect, the provided algorithm creates a
- background tone. Due to the nature of PWM, at the beginning of each
- piece of data, the output changes state as it goes from high to low.
- [See figure 7.] This periodic oscillation produces a pitch equal to
- the frequency at which the sound is played. For example, an 8 kHz
- sample will produce a background tone of 8 kHz. The resulting tone
- overlays the digitized sound output. The human ear can not detect
- pitches beyond 22 kHz, but most people can not hear a pitch of 18 kHz
- or greater frequency. Thus, any sample of this frequency will not
- produce an audible background tone. If a given sample is not of high
- enough frequency, this problem can be alleviated by playing each piece
- of data multiple times in rapid succession so that the background tone
- is of such a high frequency that it is inaudible. For example, by
- playing each datum of a 9.5 kHz sample twice, the resulting pitch will
- be 19 kHz. The first problem, however, becomes dramatic when a
- moderate-speed sample is sent repeatedly: in order to maintain the
- original sampling rate, the 8253 has time to process fewer and fewer
- bits for each datum [5].
-
-
- DISCUSSION
- Several other methods were attempted before the above method was
- discovered. Although all of these other methods yield sound of
- varying quality, most do not produce recognizable results, and none
- match the PWM method. With some work, any of the following techniques
- might be able to give better output.
- One idea is to look at each piece of data, compare it to the
- midpoint, and set the speaker bit appropriately. For example, the
- midpoint for 8-bit data is alternately 0 or 128 (depending on whether
- a sign bit is used), so any values above 128 (or 0) could be set as 1
- and all other values set as 0. Modifying bit 3 of I/O port 97 (61
- hex) changes the state of the speaker's position. This idea could
- never yield very good results, however, as it gives only 1-bit
- accuracy.
- A heretofore ineffective method is based on PCM encoding. Each
- individual bit of input directly determines the state of the speaker.
- Manipulating bit 3 of port 97 yields the two states.
- Another possibility is to use mode 3 of the 8253 to play a pitch
- which is either directly or inversely proportional to the data.
- Although this method yields audible sound, the results are not as good
- as those of the main PWM method.
- A final idea is to use the 8253 based on pulse position
- modulation. Mode 4 of the 8253 should produce PPM output, but trials
- so far have given negative results.
-
- The given algorithm can be further improved by allowing the
- program to automatically determine the speed of the computer. In the
- current method, a delay time must be entered manually. By
- incrementing a counter while watching the system clock, a ratio of
- instructions to time can be calculated. Under PC-DOS and OS/2, the
- 18.2 Hz frequency of the system clock limits the accuracy of this
- idea, however, unless channel 0 of the 8253 is used to change the
- system timer.
- A better timing method would be to use an algorithm that does not
- rely on loops for timing delays. Under PC-DOS, interrupts can be
- modified so that data is played after each clock tick. By increasing
- the frequency of clock ticks and changing the timer interrupt, the
- computer can output each piece of data at a regular interval. Other
- operating systems, however, do not allow such interrupt modification.
-
- The development of this digital audio playback method has several
- implications and possibilities. A variety of applications, from games
- to word processors, can use voice and sound. On multi-tasking
- operating systems, sounds can easily be played in the background.
- Other computers could use the same ideas for audio output. When
- combined with extra hardware, this method can form a complete audio
- I/O system. When a PC acts as a terminal, this procedure will allow
- mainframes and minicomputers to play digitized sound.
- Speech interaction is currently put to several uses. For
- example, IBM's SpeechViewer helps deaf children and adults improve
- pronunciation. In addition, several programs use spoken words to help
- people learn to read and write. For blind users, IBM's ScreenReader
- program can vocally relate the text that appears on the monitor.
- There are numerous instances of disabled users benefitting from
- talking computers.
- In addition, many musicians use computers to produce and mix
- sounds. Computers can now produce rich tones that rival musical
- instruments.
- By reducing the need for extra hardware, digitized sound can
- easily be added to other programs. As an obvious example, games can
- use sound for both special effects and general entertainment. Useful
- applications, from word processors to spreadsheets, can speak to help
- visually impaired users. Inexperienced users would find a computer to
- be much friendlier if it could speak to them. A user interface that
- includes speech can help bring computers to the level of interaction
- that humans use with each other. Thus, nearly all types of software
- can benefit from the addition of speech and sound capabilities [5].
- When running under PC-DOS, only one program can be run at a time.
- The only way to allow the computer to play recorded sounds while
- continuing other work is to modify interrupts, as mentioned above.
- Under multi-tasking operating systems, such as OS/2 and Unix, the 8253
- could be continually fed data in a background task while the main
- program continues. Playing sound in the background gives more
- flexibility. For example, a communications package could verbally
- report an error while continuing to receive data, or a graphical demo
- could play music in the background while displaying picture on the
- monitor.
- The data storage method used here is compatible with many others.
- A huge number of digitized samples are available from Macintosh,
- Amiga, and Atari ST computers. Several PC expansion boards also use
- the same storage method. Data recorded on any of this hardware can be
- played back on an ordinary PC. Whether the original sample is in the
- form of PAM, PCM, or PWM, it can play through the PC speaker.
- Furthermore, by purchasing an available expansion board or building
- one, sound recording is possible on PCs.
- As the use of speech technology grows, speech can be added to
- larger machines. According to IBM's long-range plan, all host
- computers will eventually be accessed via a PS/2 running OS/2.
- Although mainframes and minicomputers do not typically have sound
- capabilities, they could be use the PS/2's speaker for speech output.
- Thus, by using PCs as terminals, the full range of computers can
- handle digital audio.
- The algorithm presented here can be used in settings other than
- in a PC. The same method could be used in any computer with an 8253
- chip, and a hardware expansion using the 8253 can be added to other
- computers. More importantly, any hardware configuration capable of
- producing output similar to PWM can, when connected to a speaker,
- produce digitized sound. Similarly, any system able to produce pulses
- similar to any digital recording method can output digitized sound.
- As a result, hardware with only two output states can play sound, and
- a digital-to-analog converter, such as found in the Amiga, is not
- required.
-
-
- CONCLUSION
- Over the past several decades, engineers have searched for ways
- to make computers both talk and play high-quality music. One solution
- to both problems, digitization of sound using pulse modulation,
- requires little processing time and is thus appropriate for
- microcomputers. Although previous usage of digitized sound has been
- limited to computers with specialized hardware, it is possible for a
- standard PC to play good quality sounds without extra hardware. As
- the computer world strides deeper into multimedia and other sound-
- based applications despite a lack of hardware standards for sound
- output on PCs, this method may prove to be invaluable in bringing
- sound to the masses. With minimal effort, any program can add a new
- dimension with speech, music, and sound effects.
-
-
- REFERENCES
-
-
- [1] Pohlmann, Ken C. Principles of Digital Audio. H. W. Sams,
- Indianapolis (1985).
-
- [3] Rosch, Winn L. The Winn Rosch Hardware Bible. Simon & Schuster,
- New York (1988).
-
- [5] Sargent, Murray, III and Richard L. Shoemaker. The IBM Personal
- Computer from the Inside Out. Addison-Wesley (1984).
-
- [2] Norton, Peter. The Peter Norton Programmer's Guide to the IBM PC.
- 1st ed. Microsoft, Redmond, WA (1985).
-
- [6] Chappell, David T. "Achieving Inexpensive Digital Audio on PCs
- for Educational Purposes". Proceedings of the Southeastern Small
- College Computing Conference. (1990).
-
-
-
- Permission to copy without fee all or part of this material is granted
- provided that the copies are not made or distributed for direct
- commercial advantage, the ACM copyright notice and the title of the
- publication and its date appear, and notice is given that copying is
- by permission of the Association for Computing Machinery. To copy
- otherwise, or to republish, requires a fee and/or specific permission.
-
- 1
-
- 17
-
-