edu.cmu.sphinx.frontend.frequencywarp
Class MelFrequencyFilterBank

java.lang.Object
  extended byedu.cmu.sphinx.frontend.BaseDataProcessor
      extended byedu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank
All Implemented Interfaces:
Configurable, DataProcessor

public class MelFrequencyFilterBank
extends BaseDataProcessor

Filters an input power spectrum through a bank of number of mel-filters. The output is an array of filtered values, typically called mel-spectrum, each corresponding to the result of filtering the input spectrum through an individual filter. Therefore, the length of the output array is equal to the number of filters created.

The triangular mel-filters in the filter bank are placed in the frequency axis so that each filter's center frequency follows the mel scale, in such a way that the filter bank mimics the critical band, which represents different perceptual effect at different frequency bands. Additionally, the edges are placed so that they coincide with the center frequencies in adjacent filters. Pictorially, the filter bank looks like:


Figure 1: A Mel-filter bank.

As you might notice in the above figure, the distance at the base from the center to the left edge is different from the center to the right edge. Since the center frequencies follow the mel-frequency scale, which is a non-linear scale that models the non-linear human hearing behavior, the mel filter bank corresponds to a warping of the frequency axis. As can be inferred from the figure, filtering with the mel scale emphasizes the lower frequencies. A common model for the relation between frequencies in mel and linear scales is as follows:

melFrequency = 2595 * log(1 + linearFrequency/700)

The constants that define the filterbank are the number of filters, the minimum frequency, and the maximum frequency. The minimum and maximum frequencies determine the frequency range spanned by the filterbank. These frequencies depend on the channel and the sampling frequency that you are using. For telephone speech, since the telephone channel corresponds to a bandpass filter with cutoff frequencies of around 300Hz and 3700Hz, using limits wider than these would waste bandwidth. For clean speech, the minimum frequency should be higher than about 100Hz, since there is no speech information below it. Furthermore, by setting the minimum frequency above 50/60Hz, we get rid of the hum resulting from the AC power, if present.

The maximum frequency has to be lower than the Nyquist frequency, that is, half the sampling rate. Furthermore, there is not much information above 6800Hz that can be used for improving separation between models. Particularly for very noisy channels, maximum frequency of around 5000Hz may help cut off the noise.

Typical values for the constants defining the filter bank are:

Sample rate (Hz) 16000 11025 8000
numberFilters 40 36 31
minimumFrequency(Hz) 130 130 200
maximumFrequency(Hz) 6800 5400 3500

Davis and Mermelstein showed that Mel-frequency cepstral coefficients present robust characteristics that are good for speech recognition. For details, see Davis and Mermelstein, Comparison of Parametric Representations for Monosyllable Word Recognition in Continuously Spoken Sentences, IEEE Transactions on Acoustic, Speech and Signal Processing, 1980 .

See Also:
MelFilter

Field Summary
static java.lang.String PROP_MAX_FREQ
          The name of the Sphinx Property for the maximum frequency covered by the filterbank.
static double PROP_MAX_FREQ_DEFAULT
          The default value of PROP_MAX_FREQ.
static java.lang.String PROP_MIN_FREQ
          The name of the Sphinx Property for the minimum frequency covered by the filterbank.
static double PROP_MIN_FREQ_DEFAULT
          The default value of PROP_MIN_FREQ.
static java.lang.String PROP_NUMBER_FILTERS
          The name of the Sphinx Property for the number of filters in the filterbank.
static int PROP_NUMBER_FILTERS_DEFAULT
          The default value for PROP_NUMBER_FILTERS.
 
Constructor Summary
MelFrequencyFilterBank()
           
 
Method Summary
 Data getData()
          Reads the next Data object, which is the power spectrum of an audio input frame.
 void initialize()
          Initializes this DataProcessor.
 void newProperties(PropertySheet ps)
          This method is called when this configurable component has new data.
 void register(java.lang.String name, Registry registry)
          Register my properties.
 
Methods inherited from class edu.cmu.sphinx.frontend.BaseDataProcessor
getName, getPredecessor, getTimer, setPredecessor, toString
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PROP_NUMBER_FILTERS

public static final java.lang.String PROP_NUMBER_FILTERS
The name of the Sphinx Property for the number of filters in the filterbank.

See Also:
Constant Field Values

PROP_NUMBER_FILTERS_DEFAULT

public static final int PROP_NUMBER_FILTERS_DEFAULT
The default value for PROP_NUMBER_FILTERS.

See Also:
Constant Field Values

PROP_MIN_FREQ

public static final java.lang.String PROP_MIN_FREQ
The name of the Sphinx Property for the minimum frequency covered by the filterbank.

See Also:
Constant Field Values

PROP_MIN_FREQ_DEFAULT

public static final double PROP_MIN_FREQ_DEFAULT
The default value of PROP_MIN_FREQ.

See Also:
Constant Field Values

PROP_MAX_FREQ

public static final java.lang.String PROP_MAX_FREQ
The name of the Sphinx Property for the maximum frequency covered by the filterbank.

See Also:
Constant Field Values

PROP_MAX_FREQ_DEFAULT

public static final double PROP_MAX_FREQ_DEFAULT
The default value of PROP_MAX_FREQ.

See Also:
Constant Field Values
Constructor Detail

MelFrequencyFilterBank

public MelFrequencyFilterBank()
Method Detail

register

public void register(java.lang.String name,
                     Registry registry)
              throws PropertyException
Description copied from interface: Configurable
Register my properties. This method is called once early in the time of the component, shortly after the component is constructed. This component should register any configuration properties that it needs to register. If this configurable extends another configurable, super.register should also be called

Specified by:
register in interface Configurable
Overrides:
register in class BaseDataProcessor
Throws:
PropertyException

newProperties

public void newProperties(PropertySheet ps)
                   throws PropertyException
Description copied from interface: Configurable
This method is called when this configurable component has new data. The component should first validate the data. If it is bad the component should return false. If the data is good, the component should record the the data internally and return true.

Specified by:
newProperties in interface Configurable
Overrides:
newProperties in class BaseDataProcessor
Throws:
PropertyException

initialize

public void initialize()
Description copied from class: BaseDataProcessor
Initializes this DataProcessor. This is typically called after the DataProcessor has been configured.

Specified by:
initialize in interface DataProcessor
Overrides:
initialize in class BaseDataProcessor

getData

public Data getData()
             throws DataProcessingException
Reads the next Data object, which is the power spectrum of an audio input frame. Signals are returned unmodified.

Specified by:
getData in interface DataProcessor
Specified by:
getData in class BaseDataProcessor
Returns:
the next available Data or Signal object, or returns null if no Data is available
Throws:
DataProcessingException - if there is a data processing error