edu.cmu.sphinx.frontend.transform
Class DiscreteFourierTransform

java.lang.Object
  extended byedu.cmu.sphinx.frontend.BaseDataProcessor
      extended byedu.cmu.sphinx.frontend.transform.DiscreteFourierTransform
All Implemented Interfaces:
Configurable, DataProcessor

public class DiscreteFourierTransform
extends BaseDataProcessor

Computes the Discrete Fourier Transform (FT) of an input sequence, using Fast Fourier Transform (FFT). Fourier Transform is the process of analyzing a signal into its frequency components. In speech, rather than analyzing the signal over its entrie duration, we analyze one window of audio data. This window is the product of applying a sliding Hamming window to the signal. Moreover, since the amplitude is a lot more important than the phase for speech recognition, this class returns the power spectrum of that window of data instead of the complex spectrum. Each value in the returned spectrum represents the strength of that particular frequency for that window of data.

By default, the number of FFT points is the closest power pf 2 that is equal to or larger than the number of samples in the incoming window of data. The FFT points can also be set by the user with the property defined by PROP_NUMBER_FFT_POINTS. The length of the returned power spectrum is the number of FFT points, divided by 2, plus 1. Since the input signal is real, the FFT is symmetric, and the information contained in the whole vector is already present in its first half.

Note that each call to getData only returns the spectrum of one window of data. To display the spectrogram of the entire original audio, one has to collect all the spectra from all the windows generated from the original data. A spectrogram is a two dimensional representation of three dimensional information. The horizontal axis represents time. The vertical axis represents the frequency. If we slice the spectrogram at a given time, we get the spectrum computed as the short term Fourier transform of the signal windowed around that time stamp. The intensity of the spectrum for each time frame is given by the color in the graph, or by the darkness in a gray scale plot. The spectrogram can be thought of as a view from the top of a surface generated by concatenating the spectral vectors obtained from the windowed signal.

For example, Figure 1 below shows the audio signal of the utterance "one three nine oh", and Figure 2 shows its spectrogram, produced by putting together all the spectra returned by this FFT. Frequency is on the vertical axis, and time is on the horizontal axis. The darkness of the shade represents the strength of that frequency at that point in time:



Figure 1: The audio signal of the utterance "one three nine oh".



Figure 2: The spectrogram of the utterance "one three nine oh" in Figure 1.


Field Summary
static java.lang.String PROP_NUMBER_FFT_POINTS
          The name of the SphinxProperty for the number of points in the Fourier Transform.
 
Constructor Summary
DiscreteFourierTransform()
           
 
Method Summary
 Data getData()
          Reads the next DoubleData object, which is a data frame from which we'll compute the power spectrum.
 void initialize()
          Initializes this DataProcessor.
 void newProperties(PropertySheet ps)
          This method is called when this configurable component has new data.
 void register(java.lang.String name, Registry registry)
          Register my properties.
 
Methods inherited from class edu.cmu.sphinx.frontend.BaseDataProcessor
getName, getPredecessor, getTimer, setPredecessor, toString
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PROP_NUMBER_FFT_POINTS

public static final java.lang.String PROP_NUMBER_FFT_POINTS
The name of the SphinxProperty for the number of points in the Fourier Transform.

See Also:
Constant Field Values
Constructor Detail

DiscreteFourierTransform

public DiscreteFourierTransform()
Method Detail

register

public void register(java.lang.String name,
                     Registry registry)
              throws PropertyException
Description copied from interface: Configurable
Register my properties. This method is called once early in the time of the component, shortly after the component is constructed. This component should register any configuration properties that it needs to register. If this configurable extends another configurable, super.register should also be called

Specified by:
register in interface Configurable
Overrides:
register in class BaseDataProcessor
Throws:
PropertyException

newProperties

public void newProperties(PropertySheet ps)
                   throws PropertyException
Description copied from interface: Configurable
This method is called when this configurable component has new data. The component should first validate the data. If it is bad the component should return false. If the data is good, the component should record the the data internally and return true.

Specified by:
newProperties in interface Configurable
Overrides:
newProperties in class BaseDataProcessor
Throws:
PropertyException

initialize

public void initialize()
Description copied from class: BaseDataProcessor
Initializes this DataProcessor. This is typically called after the DataProcessor has been configured.

Specified by:
initialize in interface DataProcessor
Overrides:
initialize in class BaseDataProcessor

getData

public Data getData()
             throws DataProcessingException
Reads the next DoubleData object, which is a data frame from which we'll compute the power spectrum. Signal objects just pass through unmodified.

Specified by:
getData in interface DataProcessor
Specified by:
getData in class BaseDataProcessor
Returns:
the next available power spectrum DoubleData object, or null if no Spectrum object is available
Throws:
DataProcessingException - if there is a processing error