Sphinx-4 is a speech recognition system written entirely in the Java(TM) programming language.

The diagram below shows the general architecture of Sphinx-4, followed by a description of each block:


Figure 1: Architecture diagram of Sphinx-4.

Recognizer - Contains the main components of Sphinx-4, which are the front end, the linguist, and the decoder. The application interacts with the Sphinx-4 system mainly via the Recognizer.

Audio - The data to be decoded. This is audio in most systems, but it can also be configured to accept other forms of data, e.g., spectral or cepstral data.

Front End - Performs digital signal processing (DSP) on the incoming data.

Feature - The output of the front end are features, which are used for decoding in the rest of the system.

Linguist - Embodies the linguistic knowledge of the system, which are the acoustic model, the dictionary, and the language model. The linguist produces a search graph structure on which the search manager performs search using different algorithms.

Acoustic Model - Contains a representation (often statistical) of a sound, often created by training using lots of acoustic data.

Dictionary - Responsible for determining how a words is pronounced.

Language Model - Contains a representation (often statistical) of the probability of occurrence of words.

Search Graph - The graph structure produced by the linguist according to certain criteria (e.g., the grammar), using knowledge from the dictionary, the acoustic model, and the language model.

Decoder - Contains the search manager.

Search Manager - Performs search using certain algorithm used, e.g., breadth-first search, best-first search, depth-first search, etc.. Also contains the feature scorer and the pruner.

Active List - A list of tokens representing all the states in the search graph that are active in the current feature frame.

Scorer - Scores the current feature frame against all the active states in the ActiveList.

Pruner - Prunes the active list according to certain strategies.

Result - The decoded result, which usually contains the N-best results.

Configuration Manager - loads the Sphinx-4 configuration data from an XML-based file, and manages the component life cycle for objects.