Grammar Guide


Testing Speech Grammars

There are several utility programs that will assist you in evaluating grammars that you write:

This document provides a brief description of these tools.


FSGENUM.EXE

The program fsgenum.exe generates strings that will be accepted by a specific grammar. This tool is especially useful in determining whether a grammar overgenerates--- that is, whether the grammar accepts strings that it should not. For example, suppose we want a grammar that will parse (or accept) the following sentences:

   The boy saw the dog.
   The dog saw the boy.
   The cat saw the dog.
   He saw the dog.

and we write the following simple grammar (see writing grammar files):

;; saw.bnf
;;
<s>  = <np> <vp> .
<vp> = <v> <np>  .
<np> = <det> <n> |
       <pro>     .

; lexicon
<det> = the .
<n>  = dog |
       boy |
       cat .
<pro> = he .
<v>   = saw .

After compiling this grammar using vtbnfc.exe we want to examine what strings the grammar accepts. Using fsgenum.exe (the details will be explained shortly) we see that it generates the following:

  • the boy saw the dog
  • the cat saw the dog
  • the dog saw the boy
  • he saw the cat
  • the dog saw the dog
  • the dog saw he
  • As we can now see, the grammar overgenerates--- it accepts a string that is not in the language--- the dog saw he. We can now rewrite the grammar to prevent this string from being accepted. To give a more practical example, rules that allow for the acceptance of twelve hundred dollars might also contribute to the acceptance of the ill-formed four thousand twelve hundred dollars. Overgeneration is a common error in writing complex grammars and this tool is a useful debugging aid.

    Syntax of fsgenum.exe

    The syntax of this tool is:
    fsgenum [options] fsg-filename
    
    The available options are:
    -r num
    This option generates a set of num strings described by the fsg file. For example, fsgenum -r 10 saw.fsg will display 10 strings that saw.fsg will accept.
    -R num
    This option generates a random set of num strings described by the fsg file. That is, a different set of strings will be generated each time the fsgenum -R num command is issued. (Whereas repeated uses of the fsgenum -r num command will generate the same set of strings as output.)
    -f
    This option generates the full set of strings produced by the fsg file.
    -a
    This option shows the annotations associated with each generated string.
    -p
    This option displays the probability associated with each string.
    -l num
    This option defines the maximum number of times a Kleene star or plus is iterated.

    Example

    We compile the following BNF file to produce np.fsg
    ;; np.bnf
    ;;
    <np>  = <det> <adj>* <n>.
    
    ; lexicon
    
    <adj> = old .
    <det> = the .
    <n>   = dog .
    

    fsgenum np.fsg and fsgenum -f np.fsg both produce the following output:

    the dog
    the old dog
    
    
    fsgenum -f -l 5 np.fsg produces the output:
    the dog
    the old dog
    the old old dog
    the old old old dog
    the old old old old dog
    
    


    FSGTEST.EXE

    This tool determines whether a given string (received from standard input) is accepted by the specified grammar. For example, using the np.fsg file described immediately above, we start this tool by issuing the command:
    fsgtest np.fsg
    
    
    After issuing this command a user can type in a string followed by the enter key. (However, it should be noted that the program does not produce a prompt.) If the entered string is accepted by np.fsg the string will be echoed. If it is rejected the string is echoed with a preceding asterisk. This is illustrated in the following example (where the user input is in bold).
    
    fsgtest np.fsg
    the dog
      the dog
    the old
    * the old
    the old dog
      the old dog
    [ctrl-c]
    The external process was cancelled by a Ctrl+Break or another process.
    
    
    A set of test strings can be input to this program by using file redirection. For example, suppose we have a currency grammar (compiled as currency.fsg) that we would like to test. We have a file, good-set, containing strings we would like the grammar to accept:
    twelve hundred dollars
    four thousand two hundred and eleven dollars
    a dollar fifty
    a dollar and a half
    one and a half dollars
    
    We also have a file, bad-set, of ill-formed strings that we want the grammar to reject:
    four thousand twelve hundred dollars
    four dollars fifty
    one dollar and a half
    
    We can test good-set by using the command:
    fsgtest currency.fsg < good-set
    
    which results in the following output:
      twelve hundred dollars
      four thousand two hundred and eleven dollars
      a dollar fifty
      a dollar and a half
      one and a half dollars
    
    We can test bad-set by using
    fsgtest currency.fsg < bad-set
    
    which results in the following output:
    * four thousand twelve hundred dollars
    * four dollars fifty
    * one dollar and a half
    

    Syntax of fsgtest.exe

    The syntax of this tool is:

    fsgtest [options] fsg-filename
    
    The available options are:

    -a
    This option shows the annotations associated with the accepted input string.
    -p
    This option displays the probability associated with each string.

    Example

    (Once again, user input is shown in bold)
    
    fsgtest -p np.fsg
    the dog
    0.50029 the dog
    the old
    ******* the old
    the old dog
    0.25029 the old dog
    [ctrl-c]
    The external process was cancelled by a Ctrl+Break or another process.
    
    
    To illustrate the use of annotations we compile the following bnf file to produce np2.fsg:
    ;; np.bnf
    ;;
    <np>  = <det> <adj>* <n>.
    
    ; lexicon
    
    <adj> = old:"adjective" .
    <det> = the:"determiner" .
    <n>   = dog:"noun" .
    

    Now we use fsgtest with the -a option:

    
    fsgtest -a np2.fsg
    the dog
      the:"determiner" dog:"noun"
    the old
      the:"determiner" old:"adjective"
    the old dog
      the:"determiner" old:"adjective" dog:"noun"
    [ctrl-c]
    The external process was cancelled by a Ctrl+Break or another process.
    
     


    FSGPRINT.EXE

    This tool displays the fsg file as a graph. For each state in the graph it displays the transitions from that state. For each transition it displays both the terminal word associated with the transition and the local probability of that transition. For example, using the np.fsg listed above:
    fsgtest np.fsg
    
     --- fsg np.fsg ---
    
    state 0
             the  -> state 1    p=1.00
    state 1
             dog  -> state *    p=0.50
             old  -> state 1    p=0.50
    state *
    
    
    
    This shows that the grammar has three states (state 0, state 1, and state *). There is only one transition from state 0. It accepts the word "the" and goes to state 1. If you are in state 0 the probability of taking this transition is 1.00 since it is the only transition. State 1 has two transitions. The first transition accepts the word "dog" and goes to the final state, state *. The second transition accepts the word "old" and loops back to state 1. Since there are two transition out of state 1, the local probability of both of these transitions is 0.50.

    Syntax of fsgprint.exe

    The syntax of this tool is:

    fsgprint [options] fsg-filename
    
    The available options are:

    -v
    This option displays more detailed information about the fsg file such as the values of certain engine parameters and a more explicit description of the graph .


    [ Top of Page | Previous Page | Next Page | Back to Grammar Guide ]