|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
See:
Description
Packages | |
edu.cmu.sphinx.decoder | Provides a set of high level classes that can be used to configure and initiate the speech recognition decoding process. |
edu.cmu.sphinx.decoder.pruner | Provides an interface that represents the pruning facility |
edu.cmu.sphinx.decoder.scorer | Provides an interface that represents entities that can be scored, and an interface and several implementations of a scorer that can score these entities. |
edu.cmu.sphinx.decoder.search | Provides classes and interfaces that are used to manage the search through the search graph. |
edu.cmu.sphinx.frontend | Provides a set of high level classes and interfaces that are used to perform digital signal processing for speech recognition. |
edu.cmu.sphinx.frontend.endpoint | Provides classes and interfaces used for speech endpointing. |
edu.cmu.sphinx.frontend.feature | Provides classes that processes features. |
edu.cmu.sphinx.frontend.filter | Provides classes that implement frequency filters |
edu.cmu.sphinx.frontend.frequencywarp | Provides classes that perform frequency warping. |
edu.cmu.sphinx.frontend.transform | Provides classes that transform data from one domain into another. |
edu.cmu.sphinx.frontend.util | Provides classes that are generally useful to the various frontend classes. |
edu.cmu.sphinx.frontend.window | Provides classes that implement windowing functions |
edu.cmu.sphinx.instrumentation | Provides a set of classes that monitor and track operational aspects of the Sphinx system. |
edu.cmu.sphinx.jsapi | Provides support for the Java Speech API for Sphinx-4 |
edu.cmu.sphinx.linguist | Provides a set of interfaces and classes that are used to define the search graph used by the decoder. |
edu.cmu.sphinx.linguist.acoustic | Provides classes that represent the acoustic model. |
edu.cmu.sphinx.linguist.acoustic.tiedstate | Provides classes that represent acoustic model in terms of a set of tied states. |
edu.cmu.sphinx.linguist.acoustic.trivial | Provides classes that represent a trivial acoustic model. |
edu.cmu.sphinx.linguist.dflat | |
edu.cmu.sphinx.linguist.dictionary | Provides a generic interface to a dictionary as well as several implementations. |
edu.cmu.sphinx.linguist.flat | Provides an implementation of the Linguist that
statically represents the search space as a flat graph, where each
word in the vocabulary has its own branch. |
edu.cmu.sphinx.linguist.language.grammar | Provides classes and interfaces that can be used to represent a graph of words and word transitions. |
edu.cmu.sphinx.linguist.language.ngram | Provides classes and interfaces that represent a stochastic language model |
edu.cmu.sphinx.linguist.language.ngram.large | Provides an implementation of the LanguageModel interface. |
edu.cmu.sphinx.linguist.lextree | Provides an implementation of the Linguist that
represents the search space as a lex tree. |
edu.cmu.sphinx.linguist.util | Provides a set of classes that are useful by implementations of the
Linguist interface. |
edu.cmu.sphinx.model.acoustic.TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800Hz | |
edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz | |
edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz | |
edu.cmu.sphinx.recognizer | Provides a set of high level classes and interfaces that are used to perform speech recognition with the Sphinx-4 speech recognition system. |
edu.cmu.sphinx.research.parallel | Provides a search manager (and supporting classes) that can perform recognition on parallel feature streams. |
edu.cmu.sphinx.result | Provides a set of classes that represent the result of a recognition. |
edu.cmu.sphinx.tools.audio | Provides an tool that records and displays the waveform and spectrogram of an audio signal. |
edu.cmu.sphinx.tools.batch | Provides an tool that performs batch-mode speech recognition |
edu.cmu.sphinx.tools.feature | Provides an tool that generates different types of features (MFCC, PLP, spectrum) from audio files. |
edu.cmu.sphinx.tools.live | Provides an tool that performs pseudo-live-mode speech recognition |
edu.cmu.sphinx.tools.tags | Provides tools to post-process JSGF RuleParse objects using ECMAScript Action Tags for JSGF. |
edu.cmu.sphinx.util | Provides a set of general purpose utility classes for Sphinx. |
edu.cmu.sphinx.util.props | Provides a mechanism for managing persistent configuration data. |
Sphinx-4 is a speech recognition system written entirely in the Java(TM) programming language.
The diagram below shows the general architecture of Sphinx-4, followed by a description of each block:
Figure 1: Architecture diagram of Sphinx-4.
Recognizer - Contains the main components of Sphinx-4, which are the front end, the linguist, and the decoder. The application interacts with the Sphinx-4 system mainly via the Recognizer.
Audio - The data to be decoded. This is audio in most systems, but it can also be configured to accept other forms of data, e.g., spectral or cepstral data.
Front End - Performs digital signal processing (DSP) on the incoming data.
Feature - The output of the front end are features, which are used for decoding in the rest of the system.
Linguist - Embodies the linguistic knowledge of the system, which are the acoustic model, the dictionary, and the language model. The linguist produces a search graph structure on which the search manager performs search using different algorithms.
Acoustic Model - Contains a representation (often statistical) of a sound, often created by training using lots of acoustic data.
Dictionary - Responsible for determining how a words is pronounced.
Language Model - Contains a representation (often statistical) of the probability of occurrence of words.
Search Graph - The graph structure produced by the linguist according to certain criteria (e.g., the grammar), using knowledge from the dictionary, the acoustic model, and the language model.
Decoder - Contains the search manager.
Search Manager - Performs search using certain algorithm used, e.g., breadth-first search, best-first search, depth-first search, etc.. Also contains the feature scorer and the pruner.
Active List - A list of tokens representing all the states in the search graph that are active in the current feature frame.
Scorer - Scores the current feature frame against all the active states in the ActiveList.
Pruner - Prunes the active list according to certain strategies.
Result - The decoded result, which usually contains the N-best results.
Configuration Manager - loads the Sphinx-4 configuration data from an XML-based file, and manages the component life cycle for objects.
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |