edu.cmu.sphinx.linguist.language.grammar
Class FSTGrammar

java.lang.Object
  extended byedu.cmu.sphinx.linguist.language.grammar.Grammar
      extended byedu.cmu.sphinx.linguist.language.grammar.FSTGrammar
All Implemented Interfaces:
Configurable

public class FSTGrammar
extends Grammar

Loads a grammar from a file representing a finite-state transducer (FST) in the 'ARPA' grammar format. The ARPA FST format is like so (the explanation of the format is below):

  I 2
  F 0 2.30259
  T 0 1 <unknown> <unknown> 2.30259
  T 0 4 wood wood 1.60951
  T 0 5 cindy cindy 1.60951
  T 0 6 pittsburgh pittsburgh 1.60951
  T 0 7 jean jean 1.60951
  F 1 2.89031
  T 1 0 , , 0.587725
  T 1 4 wood wood 0.58785
  F 2 3.00808
  T 2 0 , , 0.705491
  T 2 1 <unknown> <unknown> 0.58785
  F 3 2.30259
  T 3 0
  F 4 2.89031
  T 4 0 , , 0.587725
  T 4 6 pittsburgh pittsburgh 0.58785
  F 5 2.89031
  T 5 0 , , 0.587725
  T 5 7 jean jean 0.58785
  F 6 2.89031
  T 6 0 , , 0.587725
  T 6 5 cindy cindy 0.58785
  F 7 1.28093
  T 7 0 , , 0.454282
  T 7 4 wood wood 1.28093  
   
Key:
  I - initial node, so "I 2" means node 2 is the initial node
  F - final node, e.g., "F 0 2.30259" means that node 0 is a final node,
  and the probability of finishing at node 0 is 2.30259 (in -ln)
  T - transition, "T 0 4 wood wood 1.60951" means "transitioning from
  node 0 to node 4, the output is wood and the machine is now
  in the node wood, and the probability associated with the
  transition is 1.60951 (in -ln)". "T 6 0 , , 0.587725" is
  a backoff transition, and the output is null (epsilon in
  the picture), and the machine is now in the null node.
   

Probabilities read in from the FST file are in negative natural log format and are converted to the internal logMath log base.

As the FST file is read in, a Grammar object that is structurally equivalent to the FST is created. The steps of converting the FST file to a Grammar object are:

  1. Create all the Grammar nodes
    Go through the entire FST file and for each word transition, take the destination node ID and create a grammar node using that ID. These nodes are kept in a hashtable to make sure they are created once for each ID. Therefore, we get one word per grammar node.

  2. Create an end node for each Grammar node
    This is end node is used for backoff transitions into the Grammar node, so that it will not go through the word itself, but instead go directly to the end of the word. Moreover, we also add an optional silence node between the grammar node and its end node. The result of this step on each grammar node (show in Figure 1 below as the circle with "word") is as follows. The end node is the empty circle at the far right:

    Figure 1: Addition of end node and the optional silence.

  3. Create the transitions
    Read through the entire FST file, and for each line indicating a transition, connect up the corresponding Grammar nodes. Backoff transitions and null transitions (i.e., the ones that do not output a word) will be linked to the end node of a grammar node.


Field Summary
static java.lang.String PROP_LOG_MATH
          Sphinx property that defines the logMath component.
static java.lang.String PROP_PATH
          The SphinxProperty for the location of the FST n-gram file.
static java.lang.String PROP_PATH_DEFAULT
          The default value for PROP_PATH.
 
Fields inherited from class edu.cmu.sphinx.linguist.language.grammar.Grammar
PROP_ADD_FILLER_WORDS, PROP_ADD_FILLER_WORDS_DEFAULT, PROP_ADD_SIL_WORDS, PROP_ADD_SIL_WORDS_DEFAULT, PROP_DICTIONARY, PROP_OPTIMIZE_GRAMMAR, PROP_OPTIMIZE_GRAMMAR_DEFAULT, PROP_SHOW_GRAMMAR, PROP_SHOW_GRAMMAR_DEFAULT
 
Constructor Summary
FSTGrammar()
           
 
Method Summary
 void newProperties(PropertySheet ps)
          This method is called when this configurable component has new data.
 void register(java.lang.String name, Registry registry)
          Register my properties.
 
Methods inherited from class edu.cmu.sphinx.linguist.language.grammar.Grammar
allocate, deallocate, dumpGrammar, dumpRandomSentences, dumpRandomSentences, dumpStatistics, getGrammarNodes, getInitialNode, getName, getNumNodes, getRandomSentence
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PROP_PATH

public static final java.lang.String PROP_PATH
The SphinxProperty for the location of the FST n-gram file.

See Also:
Constant Field Values

PROP_PATH_DEFAULT

public static final java.lang.String PROP_PATH_DEFAULT
The default value for PROP_PATH.

See Also:
Constant Field Values

PROP_LOG_MATH

public static final java.lang.String PROP_LOG_MATH
Sphinx property that defines the logMath component.

See Also:
Constant Field Values
Constructor Detail

FSTGrammar

public FSTGrammar()
Method Detail

register

public void register(java.lang.String name,
                     Registry registry)
              throws PropertyException
Description copied from interface: Configurable
Register my properties. This method is called once early in the time of the component, shortly after the component is constructed. This component should register any configuration properties that it needs to register. If this configurable extends another configurable, super.register should also be called

Specified by:
register in interface Configurable
Overrides:
register in class Grammar
Throws:
PropertyException

newProperties

public void newProperties(PropertySheet ps)
                   throws PropertyException
Description copied from interface: Configurable
This method is called when this configurable component has new data. The component should first validate the data. If it is bad the component should return false. If the data is good, the component should record the the data internally and return true.

Specified by:
newProperties in interface Configurable
Overrides:
newProperties in class Grammar
Throws:
PropertyException