Sphinx-4

Sphinx-4 is a speech recognition system written entirely in the Java(TM) programming language.

See:
          Description

Packages
edu.cmu.sphinx.decoder Provides a set of high level classes that can be used to configure and initiate the speech recognition decoding process.
edu.cmu.sphinx.decoder.pruner Provides an interface that represents the pruning facility
edu.cmu.sphinx.decoder.scorer Provides an interface that represents entities that can be scored, and an interface and several implementations of a scorer that can score these entities.
edu.cmu.sphinx.decoder.search Provides classes and interfaces that are used to manage the search through the search graph.
edu.cmu.sphinx.frontend Provides a set of high level classes and interfaces that are used to perform digital signal processing for speech recognition.
edu.cmu.sphinx.frontend.endpoint Provides classes and interfaces used for speech endpointing.
edu.cmu.sphinx.frontend.feature Provides classes that processes features.
edu.cmu.sphinx.frontend.filter Provides classes that implement frequency filters
edu.cmu.sphinx.frontend.frequencywarp Provides classes that perform frequency warping.
edu.cmu.sphinx.frontend.transform Provides classes that transform data from one domain into another.
edu.cmu.sphinx.frontend.util Provides classes that are generally useful to the various frontend classes.
edu.cmu.sphinx.frontend.window Provides classes that implement windowing functions
edu.cmu.sphinx.instrumentation Provides a set of classes that monitor and track operational aspects of the Sphinx system.
edu.cmu.sphinx.jsapi Provides support for the Java Speech API for Sphinx-4
edu.cmu.sphinx.linguist Provides a set of interfaces and classes that are used to define the search graph used by the decoder.
edu.cmu.sphinx.linguist.acoustic Provides classes that represent the acoustic model.
edu.cmu.sphinx.linguist.acoustic.tiedstate Provides classes that represent acoustic model in terms of a set of tied states.
edu.cmu.sphinx.linguist.acoustic.trivial Provides classes that represent a trivial acoustic model.
edu.cmu.sphinx.linguist.dflat  
edu.cmu.sphinx.linguist.dictionary Provides a generic interface to a dictionary as well as several implementations.
edu.cmu.sphinx.linguist.flat Provides an implementation of the Linguist that statically represents the search space as a flat graph, where each word in the vocabulary has its own branch.
edu.cmu.sphinx.linguist.language.grammar Provides classes and interfaces that can be used to represent a graph of words and word transitions.
edu.cmu.sphinx.linguist.language.ngram Provides classes and interfaces that represent a stochastic language model
edu.cmu.sphinx.linguist.language.ngram.large Provides an implementation of the LanguageModel interface.
edu.cmu.sphinx.linguist.lextree Provides an implementation of the Linguist that represents the search space as a lex tree.
edu.cmu.sphinx.linguist.util Provides a set of classes that are useful by implementations of the Linguist interface.
edu.cmu.sphinx.model.acoustic.TIDIGITS_8gau_13dCep_16k_40mel_130Hz_6800Hz  
edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz  
edu.cmu.sphinx.model.acoustic.WSJ_8gau_13dCep_8kHz_31mel_200Hz_3500Hz  
edu.cmu.sphinx.recognizer Provides a set of high level classes and interfaces that are used to perform speech recognition with the Sphinx-4 speech recognition system.
edu.cmu.sphinx.research.parallel Provides a search manager (and supporting classes) that can perform recognition on parallel feature streams.
edu.cmu.sphinx.result Provides a set of classes that represent the result of a recognition.
edu.cmu.sphinx.tools.audio Provides an tool that records and displays the waveform and spectrogram of an audio signal.
edu.cmu.sphinx.tools.batch Provides an tool that performs batch-mode speech recognition
edu.cmu.sphinx.tools.feature Provides an tool that generates different types of features (MFCC, PLP, spectrum) from audio files.
edu.cmu.sphinx.tools.live Provides an tool that performs pseudo-live-mode speech recognition
edu.cmu.sphinx.tools.tags Provides tools to post-process JSGF RuleParse objects using ECMAScript Action Tags for JSGF.
edu.cmu.sphinx.util Provides a set of general purpose utility classes for Sphinx.
edu.cmu.sphinx.util.props Provides a mechanism for managing persistent configuration data.

 

Sphinx-4 is a speech recognition system written entirely in the Java(TM) programming language.

The diagram below shows the general architecture of Sphinx-4, followed by a description of each block:


Figure 1: Architecture diagram of Sphinx-4.

Recognizer - Contains the main components of Sphinx-4, which are the front end, the linguist, and the decoder. The application interacts with the Sphinx-4 system mainly via the Recognizer.

Audio - The data to be decoded. This is audio in most systems, but it can also be configured to accept other forms of data, e.g., spectral or cepstral data.

Front End - Performs digital signal processing (DSP) on the incoming data.

Feature - The output of the front end are features, which are used for decoding in the rest of the system.

Linguist - Embodies the linguistic knowledge of the system, which are the acoustic model, the dictionary, and the language model. The linguist produces a search graph structure on which the search manager performs search using different algorithms.

Acoustic Model - Contains a representation (often statistical) of a sound, often created by training using lots of acoustic data.

Dictionary - Responsible for determining how a words is pronounced.

Language Model - Contains a representation (often statistical) of the probability of occurrence of words.

Search Graph - The graph structure produced by the linguist according to certain criteria (e.g., the grammar), using knowledge from the dictionary, the acoustic model, and the language model.

Decoder - Contains the search manager.

Search Manager - Performs search using certain algorithm used, e.g., breadth-first search, best-first search, depth-first search, etc.. Also contains the feature scorer and the pruner.

Active List - A list of tokens representing all the states in the search graph that are active in the current feature frame.

Scorer - Scores the current feature frame against all the active states in the ActiveList.

Pruner - Prunes the active list according to certain strategies.

Result - The decoded result, which usually contains the N-best results.

Configuration Manager - loads the Sphinx-4 configuration data from an XML-based file, and manages the component life cycle for objects.