edu.cmu.sphinx.linguist.language.ngram.large
Class LargeTrigramModel

java.lang.Object
  extended byedu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel
All Implemented Interfaces:
Configurable, LanguageModel

public class LargeTrigramModel
extends java.lang.Object
implements LanguageModel

Queries a binary language model file generated by the CMU-Cambridge Statistical Language Modelling Toolkit. Note that all probabilites in the grammar are stored in LogMath log base format. Language Probabilties in the language model file are stored in log 10 base. They are converted to the LogMath logbase.


Field Summary
static int BYTES_PER_BIGRAM
          The number of bytes per bigram in the LM file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.
static int BYTES_PER_TRIGRAM
          The number of bytes per trigram in the LM file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.
static java.lang.String PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP
          Sphinx propert that controls whether or not the language model will apply the language weight and word insertion probability
static boolean PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP_DEFAULT
          The default value for PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP
static java.lang.String PROP_BIGRAM_CACHE_SIZE
          A sphinx property that defines the maximum number of bigrams to be cached.
static int PROP_BIGRAM_CACHE_SIZE_DEFAULT
          The default value for the PROP_BIGRAM_CACHE_SIZE property
static java.lang.String PROP_CLEAR_CACHES_AFTER_UTTERANCE
          A sphinx property that controls whether the bigram and trigram caches are cleared after every utterance
static boolean PROP_CLEAR_CACHES_AFTER_UTTERANCE_DEFAULT
          The default value for the PROP_CLEAR_CACHES_AFTER_UTTERANCE property
static java.lang.String PROP_FULL_SMEAR
          If true, use full bigram information to determine smear
static boolean PROP_FULL_SMEAR_DEFAULT
          Default value for PROP_FULL_SMEAR
static java.lang.String PROP_LANGUAGE_WEIGHT
          Sphinx property that defines the language weight for the search
static float PROP_LANGUAGE_WEIGHT_DEFAULT
          The default value for the PROP_LANGUAGE_WEIGHT property
static java.lang.String PROP_LOG_MATH
          Sphinx property that defines the logMath component.
static java.lang.String PROP_QUERY_LOG_FILE
          Sphinx property for the name of the file that logs all the queried N-grams.
static java.lang.String PROP_QUERY_LOG_FILE_DEFAULT
          The default value for PROP_QUERY_LOG_FILE.
static java.lang.String PROP_TRIGRAM_CACHE_SIZE
          A sphinx property that defines that maxium number of trigrams to be cached
static int PROP_TRIGRAM_CACHE_SIZE_DEFAULT
          The default value for the PROP_TRIGRAM_CACHE_SIZE property
static java.lang.String PROP_WORD_INSERTION_PROBABILITY
          Word insertion probability property
static double PROP_WORD_INSERTION_PROBABILITY_DEFAULT
          The default value for PROP_WORD_INSERTION_PROBABILITY
 
Fields inherited from interface edu.cmu.sphinx.linguist.language.ngram.LanguageModel
PROP_DICTIONARY, PROP_FORMAT, PROP_FORMAT_DEFAULT, PROP_LOCATION, PROP_LOCATION_DEFAULT, PROP_MAX_DEPTH, PROP_MAX_DEPTH_DEFAULT, PROP_UNIGRAM_WEIGHT, PROP_UNIGRAM_WEIGHT_DEFAULT
 
Constructor Summary
LargeTrigramModel()
           
 
Method Summary
 void allocate()
          Create the language model
 void deallocate()
          Deallocate resources allocated to this language model
 float getBackoff(WordSequence wordSequence)
          Returns the backoff probability for the give sequence of words
 int getBigramMisses()
          Returns the number of times when a bigram is queried, but there is no bigram in the LM (in which case it uses the backoff probabilities).
 int getMaxDepth()
          Returns the maximum depth of the language model
 java.lang.String getName()
          Retrieves the name for this configurable component
 float getProbability(WordSequence wordSequence)
          Gets the ngram probability of the word sequence represented by the word list
 float getSmear(WordSequence wordSequence)
          Gets the smear term for the given wordSequence
 float getSmearOld(WordSequence wordSequence)
          Gets the smear term for the given wordSequence
 int getTrigramHits()
          Returns the number of trigram hits.
 int getTrigramMisses()
          Returns the number of times when a trigram is queried, but there is no trigram in the LM (in which case it uses the backoff probabilities).
 java.util.Set getVocabulary()
          Returns the set of words in the lanaguage model.
 int getWordID(Word word)
          Returns the ID of the given word.
 void newProperties(PropertySheet ps)
          This method is called when this configurable component has new data.
 void register(java.lang.String name, Registry registry)
          Register my properties.
 void start()
          Called before a recognition
 void stop()
          Called after a recognition
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PROP_QUERY_LOG_FILE

public static final java.lang.String PROP_QUERY_LOG_FILE
Sphinx property for the name of the file that logs all the queried N-grams. If this property is set to null, it means that the queried N-grams are not logged.

See Also:
Constant Field Values

PROP_QUERY_LOG_FILE_DEFAULT

public static final java.lang.String PROP_QUERY_LOG_FILE_DEFAULT
The default value for PROP_QUERY_LOG_FILE.


PROP_TRIGRAM_CACHE_SIZE

public static final java.lang.String PROP_TRIGRAM_CACHE_SIZE
A sphinx property that defines that maxium number of trigrams to be cached

See Also:
Constant Field Values

PROP_TRIGRAM_CACHE_SIZE_DEFAULT

public static final int PROP_TRIGRAM_CACHE_SIZE_DEFAULT
The default value for the PROP_TRIGRAM_CACHE_SIZE property

See Also:
Constant Field Values

PROP_BIGRAM_CACHE_SIZE

public static final java.lang.String PROP_BIGRAM_CACHE_SIZE
A sphinx property that defines the maximum number of bigrams to be cached.

See Also:
Constant Field Values

PROP_BIGRAM_CACHE_SIZE_DEFAULT

public static final int PROP_BIGRAM_CACHE_SIZE_DEFAULT
The default value for the PROP_BIGRAM_CACHE_SIZE property

See Also:
Constant Field Values

PROP_CLEAR_CACHES_AFTER_UTTERANCE

public static final java.lang.String PROP_CLEAR_CACHES_AFTER_UTTERANCE
A sphinx property that controls whether the bigram and trigram caches are cleared after every utterance

See Also:
Constant Field Values

PROP_CLEAR_CACHES_AFTER_UTTERANCE_DEFAULT

public static final boolean PROP_CLEAR_CACHES_AFTER_UTTERANCE_DEFAULT
The default value for the PROP_CLEAR_CACHES_AFTER_UTTERANCE property

See Also:
Constant Field Values

PROP_LANGUAGE_WEIGHT

public static final java.lang.String PROP_LANGUAGE_WEIGHT
Sphinx property that defines the language weight for the search

See Also:
Constant Field Values

PROP_LANGUAGE_WEIGHT_DEFAULT

public static final float PROP_LANGUAGE_WEIGHT_DEFAULT
The default value for the PROP_LANGUAGE_WEIGHT property

See Also:
Constant Field Values

PROP_LOG_MATH

public static final java.lang.String PROP_LOG_MATH
Sphinx property that defines the logMath component.

See Also:
Constant Field Values

PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP

public static final java.lang.String PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP
Sphinx propert that controls whether or not the language model will apply the language weight and word insertion probability

See Also:
Constant Field Values

PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP_DEFAULT

public static final boolean PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP_DEFAULT
The default value for PROP_APPLY_LANGUAGE_WEIGHT_AND_WIP

See Also:
Constant Field Values

PROP_WORD_INSERTION_PROBABILITY

public static final java.lang.String PROP_WORD_INSERTION_PROBABILITY
Word insertion probability property

See Also:
Constant Field Values

PROP_WORD_INSERTION_PROBABILITY_DEFAULT

public static final double PROP_WORD_INSERTION_PROBABILITY_DEFAULT
The default value for PROP_WORD_INSERTION_PROBABILITY

See Also:
Constant Field Values

PROP_FULL_SMEAR

public static final java.lang.String PROP_FULL_SMEAR
If true, use full bigram information to determine smear

See Also:
Constant Field Values

PROP_FULL_SMEAR_DEFAULT

public static final boolean PROP_FULL_SMEAR_DEFAULT
Default value for PROP_FULL_SMEAR

See Also:
Constant Field Values

BYTES_PER_BIGRAM

public static final int BYTES_PER_BIGRAM
The number of bytes per bigram in the LM file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.

See Also:
Constant Field Values

BYTES_PER_TRIGRAM

public static final int BYTES_PER_TRIGRAM
The number of bytes per trigram in the LM file generated by the CMU-Cambridge Statistical Language Modelling Toolkit.

See Also:
Constant Field Values
Constructor Detail

LargeTrigramModel

public LargeTrigramModel()
Method Detail

register

public void register(java.lang.String name,
                     Registry registry)
              throws PropertyException
Description copied from interface: Configurable
Register my properties. This method is called once early in the time of the component, shortly after the component is constructed. This component should register any configuration properties that it needs to register. If this configurable extends another configurable, super.register should also be called

Specified by:
register in interface Configurable
Parameters:
name - the name of the component
registry - the registry for this component
Throws:
PropertyException

newProperties

public void newProperties(PropertySheet ps)
                   throws PropertyException
Description copied from interface: Configurable
This method is called when this configurable component has new data. The component should first validate the data. If it is bad the component should return false. If the data is good, the component should record the the data internally and return true.

Specified by:
newProperties in interface Configurable
Parameters:
ps - a property sheet holding the new data
Throws:
PropertyException - if there is a problem with the properties.

getName

public java.lang.String getName()
Description copied from interface: Configurable
Retrieves the name for this configurable component

Specified by:
getName in interface Configurable
Returns:
the name

allocate

public void allocate()
              throws java.io.IOException
Description copied from interface: LanguageModel
Create the language model

Specified by:
allocate in interface LanguageModel
Throws:
java.io.IOException

deallocate

public void deallocate()
Description copied from interface: LanguageModel
Deallocate resources allocated to this language model

Specified by:
deallocate in interface LanguageModel

start

public void start()
Called before a recognition

Specified by:
start in interface LanguageModel

stop

public void stop()
Called after a recognition

Specified by:
stop in interface LanguageModel

getProbability

public float getProbability(WordSequence wordSequence)
Gets the ngram probability of the word sequence represented by the word list

Specified by:
getProbability in interface LanguageModel
Parameters:
wordSequence - the word sequence
Returns:
the probability of the word sequence. Probability is in logMath log base

getWordID

public final int getWordID(Word word)
Returns the ID of the given word.

Parameters:
word - the word to find the ID
Returns:
the ID of the word

getSmearOld

public float getSmearOld(WordSequence wordSequence)
Gets the smear term for the given wordSequence

Parameters:
wordSequence - the word sequence
Returns:
the smear term associated with this word sequence

getSmear

public float getSmear(WordSequence wordSequence)
Description copied from interface: LanguageModel
Gets the smear term for the given wordSequence

Specified by:
getSmear in interface LanguageModel
Parameters:
wordSequence - the word sequence
Returns:
the smear term associated with this word sequence

getBackoff

public float getBackoff(WordSequence wordSequence)
Returns the backoff probability for the give sequence of words

Parameters:
wordSequence - the sequence of words
Returns:
the backoff probability in LogMath log base

getMaxDepth

public int getMaxDepth()
Returns the maximum depth of the language model

Specified by:
getMaxDepth in interface LanguageModel
Returns:
the maximum depth of the language model

getVocabulary

public java.util.Set getVocabulary()
Returns the set of words in the lanaguage model. The set is unmodifiable.

Specified by:
getVocabulary in interface LanguageModel
Returns:
the unmodifiable set of words

getBigramMisses

public int getBigramMisses()
Returns the number of times when a bigram is queried, but there is no bigram in the LM (in which case it uses the backoff probabilities).

Returns:
the number of bigram misses

getTrigramMisses

public int getTrigramMisses()
Returns the number of times when a trigram is queried, but there is no trigram in the LM (in which case it uses the backoff probabilities).

Returns:
the number of trigram misses

getTrigramHits

public int getTrigramHits()
Returns the number of trigram hits.

Returns:
the number of trigram hits