edu.cmu.sphinx.linguist
Interface Linguist

All Superinterfaces:
Configurable
All Known Implementing Classes:
DynamicFlatLinguist, FlatLinguist, LexTreeLinguist

public interface Linguist
extends Configurable

The linguist is responsible for representing and managing the search space for the decoder. The role of the linguist is to provide, upon request, the search graph that is to be used by the decoder. The linguist is a generic interface that provides language model services. The main role of any linguist is to represent the search space for the decoder. The search space can be retrieved by a SearchManager via a call to getSearchGraph. This method returns a SearchGraph. The initial state in the search graph can be retrieved via a call to getInitialState Successor states can be retrieved via calls to SearchState.getSuccessors().. There are a number of search state subinterfaces that are used to indicate different types of states in the search space:

A linguist has a great deal of latitude about the order in which it returns states. For instance a 'flat' linguist may return a WordState at the beginning of a word, while a 'tree' linguist may return WordStates at the ending of a word. Likewise, a linguist may omit certain state types completely (such as a unit state). Some Search Managers may want to know a priori the order in which different state types will be generated by the linguist. The method SearchGraph.getNumStateOrder() can be used to retrieve the number of state types that will be returned by the linguist. The method SearchState.getOrder() returns the ranking for a particular state.

Depending on the vocabulary size and topology, the search space represented by the linguist may include a very large number of states. Some linguists will generate the search states dynamically, that is, the object representing a particular state in the search space is not created until it is needed by the SearchManager. SearchManagers often need to be able to determine if a particular state has been entered before by comparing states. Because SearchStates may be generated dynamically, the SearchState.equals() call (as opposed to the reference equals '==' method) should be used to determine if states are equal. The states returned by the linguist will generally provide very efficient implementations of equals and hashCode. This will allow a SearchManager to maintain collections of states in HashMaps efficiently.

The lifecycle of a linguist is as follows:


Field Summary
static java.lang.String PROP_ADD_FILLER_WORDS
          Property that controls whether filler words are automatically added to the vocabulary
static boolean PROP_ADD_FILLER_WORDS_DEFAULT
          The default value for PROP_ADD_FILLER_WORDS.
static java.lang.String PROP_COMPOSITE_THRESHOLD
          Property to control the maximum number of right contexts to consider before switching over to using composite hmms
static int PROP_COMPOSITE_THRESHOLD_DEFAULT
          The default value for PROP_COMPOSITE_THRESHOLD.
static java.lang.String PROP_FILLER_INSERTION_PROBABILITY
          Filler insertion probability property
static double PROP_FILLER_INSERTION_PROBABILITY_DEFAULT
          The default value for PROP_FILLER_INSERTION_PROBABILITY.
static java.lang.String PROP_GENERATE_UNIT_STATES
          Property to control whether or not the linguist will generate unit states.
static boolean PROP_GENERATE_UNIT_STATES_DEFAULT
          The default value for PROP_GENERATE_UNIT_STATES
static java.lang.String PROP_LANGUAGE_WEIGHT
          Sphinx property that defines the language weight for the search
static float PROP_LANGUAGE_WEIGHT_DEFAULT
          The default value for the PROP_LANGUAGE_WEIGHT property
static java.lang.String PROP_SHOW_COMPILATION_PROGRESS
          Property to control whether compilation progress is displayed on stdout.
static boolean PROP_SHOW_COMPILATION_PROGRESS_DEFAULT
          The default value for PROP_SHOW_COMPILATION_PROGRESS.
static java.lang.String PROP_SHOW_SEARCH_SPACE
          Property to control the the dumping of the search space
static boolean PROP_SHOW_SEARCH_SPACE_DEFAULT
          The default value for PROP_SHOW_SEARCH_SPACE.
static java.lang.String PROP_SILENCE_INSERTION_PROBABILITY
          Silence insertion probability property
static double PROP_SILENCE_INSERTION_PROBABILITY_DEFAULT
          The default value for PROP_SILENCE_INSERTION_PROBABILITY.
static java.lang.String PROP_SPREAD_WORD_PROBABILITIES_ACROSS_PRONUNCIATIONS
          Property that controls whether word probabilities are spread across all pronunciations.
static boolean PROP_SPREAD_WORD_PROBABILITIES_ACROSS_PRONUNCIATIONS_DEFAULT
          The default value for PROP_SPREAD_WORD_PROBABILTIES_ACROSS_PRONUNCIATIONS.
static java.lang.String PROP_UNIGRAM_SMEAR_WEIGHT
          A sphinx property that determines the weight of the smear
static float PROP_UNIGRAM_SMEAR_WEIGHT_DEFAULT
          The default value for PROP_UNIGRAM_SMEAR_WEIGHT
static java.lang.String PROP_UNIT_INSERTION_PROBABILITY
          Unit insertion probability property
static double PROP_UNIT_INSERTION_PROBABILITY_DEFAULT
          The default value for PROP_UNIT_INSERTION_PROBABILITY.
static java.lang.String PROP_VALIDATE_SEARCH_SPACE
          Property to control the the validating of the search space
static boolean PROP_VALIDATE_SEARCH_SPACE_DEFAULT
          The default value for PROP_VALIDATE_SEARCH_SPACE.
static java.lang.String PROP_WANT_UNIGRAM_SMEAR
          A sphinx property that determines whether or not unigram probabilities are smeared through the lex tree
static boolean PROP_WANT_UNIGRAM_SMEAR_DEFAULT
          The default value for PROP_WANT_UNIGRAM_SMEAR
static java.lang.String PROP_WORD_INSERTION_PROBABILITY
          Word insertion probability property
static double PROP_WORD_INSERTION_PROBABILITY_DEFAULT
          The default value for PROP_WORD_INSERTION_PROBABILITY
 
Method Summary
 void allocate()
          Allocates the linguist.
 void deallocate()
          Deallocates the linguist.
 SearchGraph getSearchGraph()
          Retrieves search graph.
 void startRecognition()
          Called before a recognition.
 void stopRecognition()
          Called after a recognition.
 
Methods inherited from interface edu.cmu.sphinx.util.props.Configurable
getName, newProperties, register
 

Field Detail

PROP_WORD_INSERTION_PROBABILITY

public static final java.lang.String PROP_WORD_INSERTION_PROBABILITY
Word insertion probability property

See Also:
Constant Field Values

PROP_WORD_INSERTION_PROBABILITY_DEFAULT

public static final double PROP_WORD_INSERTION_PROBABILITY_DEFAULT
The default value for PROP_WORD_INSERTION_PROBABILITY

See Also:
Constant Field Values

PROP_UNIT_INSERTION_PROBABILITY

public static final java.lang.String PROP_UNIT_INSERTION_PROBABILITY
Unit insertion probability property

See Also:
Constant Field Values

PROP_UNIT_INSERTION_PROBABILITY_DEFAULT

public static final double PROP_UNIT_INSERTION_PROBABILITY_DEFAULT
The default value for PROP_UNIT_INSERTION_PROBABILITY.

See Also:
Constant Field Values

PROP_SILENCE_INSERTION_PROBABILITY

public static final java.lang.String PROP_SILENCE_INSERTION_PROBABILITY
Silence insertion probability property

See Also:
Constant Field Values

PROP_SILENCE_INSERTION_PROBABILITY_DEFAULT

public static final double PROP_SILENCE_INSERTION_PROBABILITY_DEFAULT
The default value for PROP_SILENCE_INSERTION_PROBABILITY.

See Also:
Constant Field Values

PROP_FILLER_INSERTION_PROBABILITY

public static final java.lang.String PROP_FILLER_INSERTION_PROBABILITY
Filler insertion probability property

See Also:
Constant Field Values

PROP_FILLER_INSERTION_PROBABILITY_DEFAULT

public static final double PROP_FILLER_INSERTION_PROBABILITY_DEFAULT
The default value for PROP_FILLER_INSERTION_PROBABILITY.

See Also:
Constant Field Values

PROP_LANGUAGE_WEIGHT

public static final java.lang.String PROP_LANGUAGE_WEIGHT
Sphinx property that defines the language weight for the search

See Also:
Constant Field Values

PROP_LANGUAGE_WEIGHT_DEFAULT

public static final float PROP_LANGUAGE_WEIGHT_DEFAULT
The default value for the PROP_LANGUAGE_WEIGHT property

See Also:
Constant Field Values

PROP_COMPOSITE_THRESHOLD

public static final java.lang.String PROP_COMPOSITE_THRESHOLD
Property to control the maximum number of right contexts to consider before switching over to using composite hmms

See Also:
Constant Field Values

PROP_COMPOSITE_THRESHOLD_DEFAULT

public static final int PROP_COMPOSITE_THRESHOLD_DEFAULT
The default value for PROP_COMPOSITE_THRESHOLD.

See Also:
Constant Field Values

PROP_SPREAD_WORD_PROBABILITIES_ACROSS_PRONUNCIATIONS

public static final java.lang.String PROP_SPREAD_WORD_PROBABILITIES_ACROSS_PRONUNCIATIONS
Property that controls whether word probabilities are spread across all pronunciations.

See Also:
Constant Field Values

PROP_SPREAD_WORD_PROBABILITIES_ACROSS_PRONUNCIATIONS_DEFAULT

public static final boolean PROP_SPREAD_WORD_PROBABILITIES_ACROSS_PRONUNCIATIONS_DEFAULT
The default value for PROP_SPREAD_WORD_PROBABILTIES_ACROSS_PRONUNCIATIONS.

See Also:
Constant Field Values

PROP_ADD_FILLER_WORDS

public static final java.lang.String PROP_ADD_FILLER_WORDS
Property that controls whether filler words are automatically added to the vocabulary

See Also:
Constant Field Values

PROP_ADD_FILLER_WORDS_DEFAULT

public static final boolean PROP_ADD_FILLER_WORDS_DEFAULT
The default value for PROP_ADD_FILLER_WORDS.

See Also:
Constant Field Values

PROP_SHOW_SEARCH_SPACE

public static final java.lang.String PROP_SHOW_SEARCH_SPACE
Property to control the the dumping of the search space

See Also:
Constant Field Values

PROP_SHOW_SEARCH_SPACE_DEFAULT

public static final boolean PROP_SHOW_SEARCH_SPACE_DEFAULT
The default value for PROP_SHOW_SEARCH_SPACE.

See Also:
Constant Field Values

PROP_VALIDATE_SEARCH_SPACE

public static final java.lang.String PROP_VALIDATE_SEARCH_SPACE
Property to control the the validating of the search space

See Also:
Constant Field Values

PROP_VALIDATE_SEARCH_SPACE_DEFAULT

public static final boolean PROP_VALIDATE_SEARCH_SPACE_DEFAULT
The default value for PROP_VALIDATE_SEARCH_SPACE.

See Also:
Constant Field Values

PROP_SHOW_COMPILATION_PROGRESS

public static final java.lang.String PROP_SHOW_COMPILATION_PROGRESS
Property to control whether compilation progress is displayed on stdout. If this property is true, a 'dot' is displayed for every 1000 search states added to the search space

See Also:
Constant Field Values

PROP_SHOW_COMPILATION_PROGRESS_DEFAULT

public static final boolean PROP_SHOW_COMPILATION_PROGRESS_DEFAULT
The default value for PROP_SHOW_COMPILATION_PROGRESS.

See Also:
Constant Field Values

PROP_GENERATE_UNIT_STATES

public static final java.lang.String PROP_GENERATE_UNIT_STATES
Property to control whether or not the linguist will generate unit states. When this property is false the linguist may omit UnitSearchState states. For some search algorithms this will allow for a faster search with more compact results.

See Also:
Constant Field Values

PROP_GENERATE_UNIT_STATES_DEFAULT

public static final boolean PROP_GENERATE_UNIT_STATES_DEFAULT
The default value for PROP_GENERATE_UNIT_STATES

See Also:
Constant Field Values

PROP_WANT_UNIGRAM_SMEAR

public static final java.lang.String PROP_WANT_UNIGRAM_SMEAR
A sphinx property that determines whether or not unigram probabilities are smeared through the lex tree

See Also:
Constant Field Values

PROP_WANT_UNIGRAM_SMEAR_DEFAULT

public static final boolean PROP_WANT_UNIGRAM_SMEAR_DEFAULT
The default value for PROP_WANT_UNIGRAM_SMEAR

See Also:
Constant Field Values

PROP_UNIGRAM_SMEAR_WEIGHT

public static final java.lang.String PROP_UNIGRAM_SMEAR_WEIGHT
A sphinx property that determines the weight of the smear

See Also:
Constant Field Values

PROP_UNIGRAM_SMEAR_WEIGHT_DEFAULT

public static final float PROP_UNIGRAM_SMEAR_WEIGHT_DEFAULT
The default value for PROP_UNIGRAM_SMEAR_WEIGHT

See Also:
Constant Field Values
Method Detail

getSearchGraph

public SearchGraph getSearchGraph()
Retrieves search graph. The search graph represents the search space to be used to guide the search.

Implementor's note: This method is typically called at the beginning of each recognition and therefore should be

Returns:
the search graph

startRecognition

public void startRecognition()
Called before a recognition. This method gives a linguist the opportunity to prepare itself before a recognition begins.

Implementor's Note - Some linguists (or underlying lanaguge or acoustic models) may keep caches or pools that need to be initialzed before a recognition. A linguist may implement this method to perform such initialization. Note however, that an ideal linguist will, once allocated, be state-less. This will allow the linguist to be shared by multiple simulataneous searches. Reliance on a 'startRecognition' may prevent a linguist from being used in a multi-threaded search.


stopRecognition

public void stopRecognition()
Called after a recognition. This method gives a linguist the opportunity to clean up after a recognition has been completed.

Implementor's Note - Some linguists (or underlying lanaguge or acoustic models) may keep caches or pools that need to be flushed after a recognition. A linguist may implement this method to perform such flushing. Note however, that an ideal linguist will once allocated, be state-less. This will allow the linguist to be shared by multiple simulataneous searches. Reliance on a 'stopRecognition' may prevent a linguist from being used in a multi-threaded search.


allocate

public void allocate()
              throws java.io.IOException
Allocates the linguist. Resources allocated by the linguist are allocated here. This method may take many seconds to complete depending upon the linguist.

Implementor's Note - A well written linguist will allow allocate to be called multiple times without harm. This will allow a linguist to be shared by multiple search managers.

Throws:
java.io.IOException - if an IO error occurs

deallocate

public void deallocate()
Deallocates the linguist. Any resources allocated by this linguist are released.

Implementor's Note - if the linguist is being shared by multiple searches, the deallocate should only actually deallocate things when the last call to deallocate is made. Two approaches for dealing with this: (1) Keep an allocation counter that is incremented during allocate and decremented during deallocate. Only when the counter reaches zero should the actually deallocation be performed. (2) Do nothing in dellocate - just the the GC take care of things