Microsoft DirectX 8.0

The Filter Graph And Its Components

This article provides a high-level overview of the components that make up Microsoft® DirectShow® filter graphs. It is intended as an introduction for developers who will write their own custom DirectShow filters, and also for DirectShow application developers who want to know something of what is going on behind the scenes when their application uses DirectShow. It is important to understand, however, that applications often have no need to deal with the inner workings of a DirectShow filter graph; in most situations the details such as adding filters, connecting pins, setting a synchronization source, and so on, are handled automatically by the Filter Graph Manager and by the filters and pins themselves. Note for Filter Developers: To use your own custom transform or effect in DirectShow, it is no longer necessary to write a native DirectShow filter. You can now write a DirectX Media Object (DMO) that does the transform, and use it in a filter graph through the DMO Wrapper filter. This frees you completely from all the details of the filter connection and synchronization mechanisms.

The Filter Graph

Whenever a media file or stream is played, recorded, captured, broadcast, or processed in any way, it is done by means of connecting one or more filters together in a configuration called a filter graph. The graph-building process can be done manually by an application or automatically by the Graph Builder or Filter Graph Manager. In either case the process usually begins with the source filter and is always based on two main factors: the number of streams and their media types that a filter expects as input, and the number of streams and their media types that it outputs. The following illustration shows a simple filter graph for playing back an AVI file with compressed video.

The arrows in the graph depict the direction in which data is flowing. The source filter here is the Async File Source. This filter simply reads the bytes from the file and passes them on to the AVI Splitter filter, which recognizes the Audio-Video Interleaved (AVI) format, including the video compression scheme. The AVI Splitter parses the data into time-stamped media samples and passes the video samples downstream to the AVI Decompressor. For video, each sample contains one frame of video. The AVI Decompressor finds the correct codec to decompress the samples and passes the decompressed video frames to the video renderer filter, which displays the video on the computer screen. Because the audio samples are uncompressed, the AVI Splitter can connect directly to the Default DirectSound Device filter.

Filters

Filters are the basic building blocks of DirectShow. DirectShow separates the processing of multimedia data into discrete steps, and a filter represents one (or sometimes more than one) processing step. This enables applications to "mix and match" filters to perform many different kinds of operations on many different media formats using many different classes of hardware and software devices. For example, the Async File Source filter reads a file from a disk, the TV Tuner filter changes the channel on a TV-capture card, and the MPEG-2 Splitter filter parses the audio and video data in an MPEG stream so that it can be decoded. Although each of these filters does something very unique internally, from the point of view of an application, each is just a DirectShow filter with certain standard characteristics: support for the IBaseFilter interface and one or more input and/or output pins which represent its connection(s) to one or more other DirectShow filters.

All DirectShow filters fall into one of these three categories: source filters, transform filters, and renderer filters.

Source Filters

Source filters present the raw multimedia data for processing. They may get it from a file on a hard disk (like the Async File Source source filter in the diagram above), or from a CD or DVD drive, or they may get it from a "live" source such as a television receiver card or a capture card connected to a digital camera. (Source filters for all these scenarios are included with DirectShow and ship with every copy of Microsoft Windows 9x and Windows 2000.) Some source filters simply pass on the raw data to a parser or splitter filter, while other source filters also perform the parsing step themselves.

Transform Filters

Transform filters accept either raw or partially-processed data and process it further before passing it on. There are many types of transform filters including (to name a few) parsers that split raw byte streams into samples or frames, compressors and decompressors, and format converters. DirectShow includes many popular codecs. In the diagram above, the AVI Splitter and the AVI decompressor are both transform filters. (One transform filter that DirectShow does not provide is an MPEG-2 decoder. These are available in either hardware or software implementations from third-parties.)

Renderer Filters

Renderer filters generally accept fully-processed data and play it on the system's monitor or through the speakers or possibly through some external device. Also included in this category are "file-writer" filters that save data to disk or other persistent storage. The video renderer filters that come with DirectShow use DirectDraw to display video and the default audio renderer filter uses DirectSound to play audio, as illustrated in the diagram above.

Filters for playing, converting, and capturing many different media formats are supplied with DirectShow and ship with every copy of Windows 9x and Windows 2000. Developers can also build their own custom filters for handling either custom or standard data formats. And for DirectX 8, DirectShow simplifies the process of "filter" development by enabling developers to write DirectX Media Objects (DMOs) and let the DMO Wrapper filter handle all the synchronization and pin connection details. DMOs are easier to develop and test than filters, and are not dependent on the DirectShow filter graph architecture. You can write a DMO once and reuse it wherever you need it, whether it is a DirectShow application or not.

Pins

As illustrated in the preceding diagram, the multimedia data in a filter graph moves downstream from a source filter through zero or more intermediate filters and finally to a renderer filter. Pins handle the low-level details of the data transfer between filters. A pin is a COM object which supports the IPin COM interface, has a direction (input or output), and is associated with a particular filter in the graph. A pin represents the point of connection with another filter. An output pin on an upstream filter connects to an input pin on the next filter downstream. Pins know what media types they can support and they negotiate the media type when two filters initially connect. Once the media type is agreed upon, the pins negotiate the details of how they will transfer data once the filter graph starts running.

Media Samples

After raw bytes have been pulled into the graph, either from a local file or capture card or some other source, the bytes must be parsed into meaningful units, called media samples. Sometimes a source filter does the parsing, and sometimes a separate parsing filter performs this task. In DirectShow, a media sample is wrapped in a COM object that implements IMediaSample2. In addition to the actual multimedia data, the object contains information including the specific media type and the synchronization times. A media sample object containing video data holds the data for one video frame. For audio, a media sample object holds the data for several audio samples. In either case, when data moves downstream through a graph from one filter to the next, it is in the form of media sample objects.

Allocators

When two filters connect, their pins must agree on the details of how the media sample objects will be transported from the upstream filter to the downstream filter. To "connect" means to determine the size, location and number of samples that will be used. The size of the samples will depend on the media type and format, and the buffer location may be either in main memory or else on a hardware device such as a video capture card. The creation and management of the samples is handled by an allocator, a COM object that is itself usually created by the input pin on the downstream filter.

See Writing DirectShow Filters for details of allocator creation and the negotiations that input and output pins perform when connecting. In certain video capture scenarios, an application may need to specify the size of buffers that should be allocated, and interfaces are provided for this purpose. For the vast majority of cases, the details of buffer allocation will be completely transparent to applications. It should be noted, however, that the "movement" of data in a filter graph does not always involve a copy operation. In the connection process, pins will "reuse" upstream buffers whenever possible in order to maximize throughput.

Clocks

In any operation related to multimedia, it is vital to synchronize the samples so that video frames are displayed at the proper rate, so that the audio stream does not get ahead of the video, and so on. A DirectShow filter graph therefore always contains exactly one clock that all filters use to either timestamp, process or render media samples. The Filter Graph Manager selects a clock (or provides one if needed) and instructs all the filters to use that clock as the synchronization source.