edu.cmu.sphinx.linguist.lextree
Class LexTreeLinguist

java.lang.Object
  extended byedu.cmu.sphinx.linguist.lextree.LexTreeLinguist
All Implemented Interfaces:
Configurable, Linguist

public class LexTreeLinguist
extends java.lang.Object
implements Linguist

A linguist that can represent large vocabularies efficiently. This class implements the Linguist interface. The main role of any linguist is to represent the search space for the decoder. The initial state in the search space can be retrieved by a SearchManager via a call to getInitialSearchState. This method returns a SearchState. Successor states can be retrieved via calls to SearchState.getSuccessors().. There are a number of search state subinterfaces that are used to indicate different types of states in the search space:

A linguist has a great deal of latitude about the order in which it returns states. For instance a 'flat' linguist may return a WordState at the beginning of a word, while a 'tree' linguist may return WordStates at the ending of a word. Likewise, a linguist may omit certain state types completely (such as a unit state). Some Search Managers may want to know a priori the order in which states will be generated by the linguist. The method getSearchStateOrder can be used to retrieve the order of state returned by the linguist.

Depending on the vocabulary size and topology, the search space represented by the linguist may include a very large number of states. Some linguists will generate the search states dynamically, that is, the object representing a particular state in the search space is not created until it is needed by the SearchManager. SearchManagers often need to be able to determine if a particular state has been entered before by comparing states. Because SearchStates may be generated dynamically, the SearchState.equals() call (as opposed to the reference equals '==' method) should be used to determine if states are equal. The states returned by the linguist will generally provide very efficient implementations of equals and hashCode. This will allow a SearchManager to maintain collections of states in HashMaps efficiently.

LexTeeLinguist Characteristics Some characteristics of this linguist:

This linguist is not a general purpose linguist. It does impose some constraints:

Design Notes The following are some notes describing the design of this linguist. They may be helpful to those who want to understand how this linguist works but are not necessary if you are only interested in using this linguist.

Search Space Representation It has been shown that representing the search space as a tree can greatly reduce the number of active states in a search since the units at the beginnings of words can be shared across multiple words. For example, with a large vocabulary (60K words), at the end of a word, with a flat representation, we have to provide transitions to the initial state of each possible word. That is 60K transitions. In a tree based system we need to only provide transitions to each initial phone (within its context). That is about 1600 transitions. This is a substantial reduction. Conceptually, this tree consists of a node for each possible initial unit. Each node can have an arbitrary number of children which can be either unit nodes or word nodes.

This linguist uses the HMMTree class to build and represent the tree. The HMMTree is given the dictionary and language model and builds the lex tree. Instead of representing the nodes in the tree as phonemes and words as is typically done, the HMMTree represents the tree as HMMs and words. The HMM is essentially a unit within its context. This is typically a triphone (although for some units (such as SIL) it is a simple phone. Representing the nodes as HMM instead of nodes yields a much larger tree, but also has some advantages:

There are some disadvantages in representing the tree with HMMs: Luckily the size and speed issues can be mitigated (by adding a bit more complexity of course). The bulk of the nodes in the HMM tree are the word ending nodes. There is a word ending node for each possible right context. To reduce space, all of the word ending nodes are replaced by a single EndNode. During the search, the actual hmm nodes for a particular EndNode are generated on request. These sets of hmm nodes can be shared among different word endings, and therefore are cached. The effect of using this EndNode optimization is to reduce the space required by the tree by about 300mb and the time required to generate the tree from about 60 seconds to about 6 seconds.


Nested Class Summary
 class LexTreeLinguist.LexTreeEndUnitState
          Represents a unit in the search space
 class LexTreeLinguist.LexTreeEndWordState
          Represents the final end of utterance word
 class LexTreeLinguist.LexTreeHMMState
          Represents a HMM state in the search space
 class LexTreeLinguist.LexTreeNonEmittingHMMState
          Represents a non emitting hmm state
 class LexTreeLinguist.LexTreeUnitState
          Represents a unit in the search space
 class LexTreeLinguist.LexTreeWordState
          Represents a word state in the search space
 
Field Summary
static java.lang.String PROP_ACOUSTIC_MODEL
          A sphinx property used to define the acoustic model to use when building the search graph
static java.lang.String PROP_CACHE_SIZE
          A sphinx property that defines the size of the arc cache (zero to disable the cache).
static int PROP_CACHE_SIZE_DEFAULT
          Property that defines the dictionary to use for this grammar
static java.lang.String PROP_DICTIONARY
          Property that defines the dictionary to use for this grammar
static java.lang.String PROP_FULL_WORD_HISTORIES
          Sphinx property used to determine whether or not the gstates are dumped. * A sphinx property that determines whether or not full word histories are used to determine when two states are equal.
static boolean PROP_FULL_WORD_HISTORIES_DEFAULT
          The default value for PROP_FULL_WORD_HISTORIES
static java.lang.String PROP_GRAMMAR
          A sphinx property used to define the grammar to use when building the search graph
static java.lang.String PROP_LANGUAGE_MODEL
          A sphinx property for the language model to be used by this grammar
static java.lang.String PROP_LOG_MATH
          Sphinx property that defines the name of the logmath to be used by this search manager.
static java.lang.String PROP_UNIT_MANAGER
          A sphinx property used to define the unit manager to use when building the search graph
 
Fields inherited from interface edu.cmu.sphinx.linguist.Linguist
PROP_ADD_FILLER_WORDS, PROP_ADD_FILLER_WORDS_DEFAULT, PROP_COMPOSITE_THRESHOLD, PROP_COMPOSITE_THRESHOLD_DEFAULT, PROP_FILLER_INSERTION_PROBABILITY, PROP_FILLER_INSERTION_PROBABILITY_DEFAULT, PROP_GENERATE_UNIT_STATES, PROP_GENERATE_UNIT_STATES_DEFAULT, PROP_LANGUAGE_WEIGHT, PROP_LANGUAGE_WEIGHT_DEFAULT, PROP_SHOW_COMPILATION_PROGRESS, PROP_SHOW_COMPILATION_PROGRESS_DEFAULT, PROP_SHOW_SEARCH_SPACE, PROP_SHOW_SEARCH_SPACE_DEFAULT, PROP_SILENCE_INSERTION_PROBABILITY, PROP_SILENCE_INSERTION_PROBABILITY_DEFAULT, PROP_SPREAD_WORD_PROBABILITIES_ACROSS_PRONUNCIATIONS, PROP_SPREAD_WORD_PROBABILITIES_ACROSS_PRONUNCIATIONS_DEFAULT, PROP_UNIGRAM_SMEAR_WEIGHT, PROP_UNIGRAM_SMEAR_WEIGHT_DEFAULT, PROP_UNIT_INSERTION_PROBABILITY, PROP_UNIT_INSERTION_PROBABILITY_DEFAULT, PROP_VALIDATE_SEARCH_SPACE, PROP_VALIDATE_SEARCH_SPACE_DEFAULT, PROP_WANT_UNIGRAM_SMEAR, PROP_WANT_UNIGRAM_SMEAR_DEFAULT, PROP_WORD_INSERTION_PROBABILITY, PROP_WORD_INSERTION_PROBABILITY_DEFAULT
 
Constructor Summary
LexTreeLinguist()
           
 
Method Summary
 void allocate()
          Allocates the linguist.
 void deallocate()
          Deallocates the linguist.
 LanguageModel getLanguageModel()
          Retrieves the language model for this linguist
 java.lang.String getName()
          Retrieves the name for this configurable component
 SearchGraph getSearchGraph()
          Retrieves search graph.
 void newProperties(PropertySheet ps)
          This method is called when this configurable component has new data.
 void register(java.lang.String name, Registry registry)
          Register my properties.
 void startRecognition()
          Called before a recognition
 void stopRecognition()
          Called after a recognition
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PROP_GRAMMAR

public static final java.lang.String PROP_GRAMMAR
A sphinx property used to define the grammar to use when building the search graph

See Also:
Constant Field Values

PROP_ACOUSTIC_MODEL

public static final java.lang.String PROP_ACOUSTIC_MODEL
A sphinx property used to define the acoustic model to use when building the search graph

See Also:
Constant Field Values

PROP_UNIT_MANAGER

public static final java.lang.String PROP_UNIT_MANAGER
A sphinx property used to define the unit manager to use when building the search graph

See Also:
Constant Field Values

PROP_LOG_MATH

public static final java.lang.String PROP_LOG_MATH
Sphinx property that defines the name of the logmath to be used by this search manager.

See Also:
Constant Field Values

PROP_FULL_WORD_HISTORIES

public static final java.lang.String PROP_FULL_WORD_HISTORIES
Sphinx property used to determine whether or not the gstates are dumped. * A sphinx property that determines whether or not full word histories are used to determine when two states are equal.

See Also:
Constant Field Values

PROP_FULL_WORD_HISTORIES_DEFAULT

public static final boolean PROP_FULL_WORD_HISTORIES_DEFAULT
The default value for PROP_FULL_WORD_HISTORIES

See Also:
Constant Field Values

PROP_LANGUAGE_MODEL

public static final java.lang.String PROP_LANGUAGE_MODEL
A sphinx property for the language model to be used by this grammar

See Also:
Constant Field Values

PROP_DICTIONARY

public static final java.lang.String PROP_DICTIONARY
Property that defines the dictionary to use for this grammar

See Also:
Constant Field Values

PROP_CACHE_SIZE

public static final java.lang.String PROP_CACHE_SIZE
A sphinx property that defines the size of the arc cache (zero to disable the cache).

See Also:
Constant Field Values

PROP_CACHE_SIZE_DEFAULT

public static final int PROP_CACHE_SIZE_DEFAULT
Property that defines the dictionary to use for this grammar

See Also:
Constant Field Values
Constructor Detail

LexTreeLinguist

public LexTreeLinguist()
Method Detail

register

public void register(java.lang.String name,
                     Registry registry)
              throws PropertyException
Description copied from interface: Configurable
Register my properties. This method is called once early in the time of the component, shortly after the component is constructed. This component should register any configuration properties that it needs to register. If this configurable extends another configurable, super.register should also be called

Specified by:
register in interface Configurable
Parameters:
name - the name of the component
registry - the registry for this component
Throws:
PropertyException

newProperties

public void newProperties(PropertySheet ps)
                   throws PropertyException
Description copied from interface: Configurable
This method is called when this configurable component has new data. The component should first validate the data. If it is bad the component should return false. If the data is good, the component should record the the data internally and return true.

Specified by:
newProperties in interface Configurable
Parameters:
ps - a property sheet holding the new data
Throws:
PropertyException - if there is a problem with the properties.

getName

public java.lang.String getName()
Description copied from interface: Configurable
Retrieves the name for this configurable component

Specified by:
getName in interface Configurable
Returns:
the name

allocate

public void allocate()
              throws java.io.IOException
Description copied from interface: Linguist
Allocates the linguist. Resources allocated by the linguist are allocated here. This method may take many seconds to complete depending upon the linguist.

Implementor's Note - A well written linguist will allow allocate to be called multiple times without harm. This will allow a linguist to be shared by multiple search managers.

Specified by:
allocate in interface Linguist
Throws:
java.io.IOException - if an IO error occurs

deallocate

public void deallocate()
Description copied from interface: Linguist
Deallocates the linguist. Any resources allocated by this linguist are released.

Implementor's Note - if the linguist is being shared by multiple searches, the deallocate should only actually deallocate things when the last call to deallocate is made. Two approaches for dealing with this: (1) Keep an allocation counter that is incremented during allocate and decremented during deallocate. Only when the counter reaches zero should the actually deallocation be performed. (2) Do nothing in dellocate - just the the GC take care of things

Specified by:
deallocate in interface Linguist

getSearchGraph

public SearchGraph getSearchGraph()
Description copied from interface: Linguist
Retrieves search graph. The search graph represents the search space to be used to guide the search.

Implementor's note: This method is typically called at the beginning of each recognition and therefore should be

Specified by:
getSearchGraph in interface Linguist
Returns:
the search graph

startRecognition

public void startRecognition()
Called before a recognition

Specified by:
startRecognition in interface Linguist

stopRecognition

public void stopRecognition()
Called after a recognition

Specified by:
stopRecognition in interface Linguist

getLanguageModel

public LanguageModel getLanguageModel()
Retrieves the language model for this linguist

Returns:
the language model (or null if there is none)