|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.cmu.sphinx.frontend.BaseDataProcessor
edu.cmu.sphinx.frontend.frequencywarp.MelFrequencyFilterBank
Filters an input power spectrum through a bank of number of mel-filters. The output is an array of filtered values, typically called mel-spectrum, each corresponding to the result of filtering the input spectrum through an individual filter. Therefore, the length of the output array is equal to the number of filters created.
The triangular mel-filters in the filter bank are placed in the frequency axis so that each filter's center frequency follows the mel scale, in such a way that the filter bank mimics the critical band, which represents different perceptual effect at different frequency bands. Additionally, the edges are placed so that they coincide with the center frequencies in adjacent filters. Pictorially, the filter bank looks like:
As you might notice in the above figure, the distance at the base from the center to the left edge is different from the center to the right edge. Since the center frequencies follow the mel-frequency scale, which is a non-linear scale that models the non-linear human hearing behavior, the mel filter bank corresponds to a warping of the frequency axis. As can be inferred from the figure, filtering with the mel scale emphasizes the lower frequencies. A common model for the relation between frequencies in mel and linear scales is as follows:
melFrequency = 2595 * log(1 + linearFrequency/700)
The constants that define the filterbank are the number of filters, the minimum frequency, and the maximum frequency. The minimum and maximum frequencies determine the frequency range spanned by the filterbank. These frequencies depend on the channel and the sampling frequency that you are using. For telephone speech, since the telephone channel corresponds to a bandpass filter with cutoff frequencies of around 300Hz and 3700Hz, using limits wider than these would waste bandwidth. For clean speech, the minimum frequency should be higher than about 100Hz, since there is no speech information below it. Furthermore, by setting the minimum frequency above 50/60Hz, we get rid of the hum resulting from the AC power, if present.
The maximum frequency has to be lower than the Nyquist frequency, that is, half the sampling rate. Furthermore, there is not much information above 6800Hz that can be used for improving separation between models. Particularly for very noisy channels, maximum frequency of around 5000Hz may help cut off the noise.
Typical values for the constants defining the filter bank are:
Sample rate (Hz) | 16000 | 11025 | 8000 |
numberFilters |
40 | 36 | 31 |
minimumFrequency (Hz) |
130 | 130 | 200 |
maximumFrequency (Hz) |
6800 | 5400 | 3500 |
Davis and Mermelstein showed that Mel-frequency cepstral coefficients present robust characteristics that are good for speech recognition. For details, see Davis and Mermelstein, Comparison of Parametric Representations for Monosyllable Word Recognition in Continuously Spoken Sentences, IEEE Transactions on Acoustic, Speech and Signal Processing, 1980 .
MelFilter
Field Summary | |
static java.lang.String |
PROP_MAX_FREQ
The name of the Sphinx Property for the maximum frequency covered by the filterbank. |
static double |
PROP_MAX_FREQ_DEFAULT
The default value of PROP_MAX_FREQ. |
static java.lang.String |
PROP_MIN_FREQ
The name of the Sphinx Property for the minimum frequency covered by the filterbank. |
static double |
PROP_MIN_FREQ_DEFAULT
The default value of PROP_MIN_FREQ. |
static java.lang.String |
PROP_NUMBER_FILTERS
The name of the Sphinx Property for the number of filters in the filterbank. |
static int |
PROP_NUMBER_FILTERS_DEFAULT
The default value for PROP_NUMBER_FILTERS. |
Constructor Summary | |
MelFrequencyFilterBank()
|
Method Summary | |
Data |
getData()
Reads the next Data object, which is the power spectrum of an audio input frame. |
void |
initialize()
Initializes this DataProcessor. |
void |
newProperties(PropertySheet ps)
This method is called when this configurable component has new data. |
void |
register(java.lang.String name,
Registry registry)
Register my properties. |
Methods inherited from class edu.cmu.sphinx.frontend.BaseDataProcessor |
getName, getPredecessor, getTimer, setPredecessor, toString |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
public static final java.lang.String PROP_NUMBER_FILTERS
public static final int PROP_NUMBER_FILTERS_DEFAULT
public static final java.lang.String PROP_MIN_FREQ
public static final double PROP_MIN_FREQ_DEFAULT
public static final java.lang.String PROP_MAX_FREQ
public static final double PROP_MAX_FREQ_DEFAULT
Constructor Detail |
public MelFrequencyFilterBank()
Method Detail |
public void register(java.lang.String name, Registry registry) throws PropertyException
Configurable
register
in interface Configurable
register
in class BaseDataProcessor
PropertyException
public void newProperties(PropertySheet ps) throws PropertyException
Configurable
newProperties
in interface Configurable
newProperties
in class BaseDataProcessor
PropertyException
public void initialize()
BaseDataProcessor
initialize
in interface DataProcessor
initialize
in class BaseDataProcessor
public Data getData() throws DataProcessingException
getData
in interface DataProcessor
getData
in class BaseDataProcessor
DataProcessingException
- if there is a data processing error
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |