Microsoft DirectX 8.0

Audio Capture

Using Microsoft® DirectShow®, an application can capture audio data from devices such as microphones or tape players via the inputs on a system's sound card. Typical scenarios include:

End users have several options for capturing audio from the sound card to the hard drive. Most cards provide applications for mixing and recording from their audio inputs. The Microsoft® Windows® operating system provides Sound Recorder, a simple utility application for recording from a microphone. Microsoft® Windows Media™ Tools includes the Windows Media Encoder stand-alone application that can perform these tasks. The Windows Media Encoder can also be incorporated into a DirectShow application as a Microsoft® DirectX® Media Object (DMO). This section describes how to integrate audio capture functionality within your own application using DirectShow.

The Audio Capture Filter

Audio capture from the analog inputs on sound cards is enabled through the Audio Capture filter. This filter uses the Microsoft® Platform SDK waveInXXX APIs (waveInOpen, waveInStart, waveInGetID, etc.) to enable any device whose driver supports these APIs to participate in a DirectShow filter graph. The Audio Capture filter exposes all inputs on the card, such as the microphone and line-in input. You can also capture from the CD-Audio input, but this audio stream has already gone through the digital-to-analog converter, so there will be a loss of sound quality from the original CD. For direct digital capture from the CD, see the Windows Media 7 SDK.

The number, type, characteristics and name of each input pin are dependent on the sound card and its driver. To fully understand and exploit the capabilities of a particular sound card, you will need the documentation from the card's manufacturer.

The Audio Capture filter represents the inputs on the card as regular DirectShow input pins. These pins represent what the driver exposes as input mixer lines. No data streams through these pins; they simply provide a means to enable and set the levels for each line. The type of control available on each line is driver-dependent; for example, a pin can only be used to control the bass and treble on an input line if the driver supports it.

To view the Audio Capture Filter in the GraphEdit utility program, do the following:

Each card on the system is represented by a separate instance of the filter. The pins you see on your filter may be different or have different names, depending on the sound card driver on your system. Each of these pins supports the IAMAudioInputMixer interface, which enables an application to select the input or inputs it is interested in by using the IAMAudioInputMixer::put_Enable method.

In addition to enabling or disabling each individual input, an application can also set the individual bass, treble and volume levels. If multiple inputs are being captured simultaneously, the overall mixing, bass, treble and volume levels can be controlled through the IAMAudioInputMixer interface that is implemented on the filter itself. The filter also implements IAMBufferNegotiation, which is useful for controlling latency in audio preview. The Audio Capture filter defaults to a half-second buffer size, which is optimal for capturing, but will cause a half-second delay in the rendering of the stream if you are previewing the live audio as you capture it. To reduce this latency period, use the IAMBufferNegotiation::SuggestAllocatorProperties method to specify a smaller size for the cbBuffer member of the ALLOCATOR_PROPERTIES structure. An 80 millisecond buffer is generally safe, but buffers of 30 or 40 milliseconds may also be sufficient. If the buffers are too small, the sound quality will be degraded.

The available sampling rates and audio formats for capture are determined by the driver. Use the IAMStreamConfig interface on the Audio Capture Filter's output pin to enumerate the available sampling rates and formats. The filter can connect downstream to any filter that accepts the output pin's media type. The audio data can then be rendered through the speakers, transformed with custom transform filters, compressed and/or written to a file.

Audio Capture Filter Graphs

The following are some possible filter graphs for various audio capture solutions:

Note  The WavDest filter is provided as a sample that you must build and register yourself.

Creating an Audio Capture Filter Graph

Creating an audio capture filter graph involves three basic tasks:

  1. Building the graph
  2. Enumerating the audio capture sources
  3. Enabling the user to select an input line

Building the Graph

The technique for building an audio capture graph that writes to a file depends on the format of the file you wish to create. If you will be saving to AVI format, use the Capture Graph Builder to build the graph. This is the simplest audio capture scenario. The resulting file will have one audio stream and no video stream. If you are saving to ASF format, you can also use the Capture Graph Builder, but you must first install the Windows Media SDK and obtain a software key to unlock it. See How to Write a Capture Application for more information on using the Capture Graph Builder. Much of the information in that article applies equally to audio-only graphs.

If you are writing to a WAV file, you cannot use the Capture Graph Builder because it does not know about the WaveDest filter. DirectShow supplies this filter only as a sample that you must compile and register. Once the WaveDest is registered on the local system, you must build the graph manually by adding the Audio Capture filter, the WavDest filter and the File Writer filter to the graph using the IFilterGraph::AddFilter method and then using IFilterGraph::ConnectDirect to connect the pins.

Enumerating the audio capture sources

After you have created an instance of the Filter Graph Manager and/or Capture Graph Builder, use the System Device Enumerator to retrieve a list of all the available audio capture devices on the system. See the article on Enumerating Devices and Filters for details on how to do this. The first device in the returned list is the one that the user has selected as the preferred sound recording device. (To select the preferred device, click Sounds and Multimedia in Control Panel.) The device enumeration tutorial also shows how to retrieve the friendly name string identifying the device. These strings can then be inserted into a list box to enable the user to select from among multiple devices.

Enabling the user to select an input line

After the Audio Capture filter is added to the graph, you can use the IAMAudioInputMixer interface methods to select the lines you want to enable, as well as the volume, bass and treble settings for that line (if the driver supports it). The filter also has a property page that enables the user to select from the available pins and set the various levels. For more information, see Displaying A Filter's Property Page and Audio Capture Filter. Alternatively, an application can enumerate the pins programmatically and create its own user interface. For information on enumerating pins, see Enumerating Pins.