Sphinx-4 Live Demo |
This is a very simple program that shows the basic speech recognition capabilities of Sphinx-4. When you run the program, the following GUI shows up:
The program prompts you to say a certain sentence (here it prompts you to say "the left most and closest"). You can select the different tasks (e.g., isolated digits, connected digits, spelling) in the "Decoder:" selection box. The text you should say is in the "Say:" box. The recognition results are displayed in the "Recognized:" box.
First of all, make sure that you have JSAPI setup correctly. Then make sure that all the Sphinx-4 classes are built. Go to the top level directory (sphinx4), and type:
ant
To build this demo, type the following in this directory (tests/live/):
ant
You can run this program in three different modes:
ant live
ant live-ep
ant live-free
NOTE:
All the tests that the live demo program will run are listed at the "decoders.list" file in this directory (tests/live/). Using the isolated digits test as an example, a test is specified as follows:
isolatedDigits.name = Isolated Digits
isolatedDigits.propertiesFile = ./ti46.props
isolatedDigits.testFile = ./isolatedDigits.test
name - this is the name you will see in the "Decoder:" box
in the live program
propertiesFile - the SphinxProperties file for this test
testFile - a list of the sentences the live demo should prompt
the user to say
You can create your own demo using different types of grammars:
The only difference in setup between the three types of grammars is in the configuration file. Lets look at each in turn:
To set up a word list grammar test, in the configuration file, specify:
<component name="flatLinguist" type="edu.cmu.sphinx.linguist.flat.FlatLinguist"> <property name="grammar" value="wordListGrammar"/> ... </component> <component name="wordListGrammar" type="edu.cmu.sphinx.linguist.language.grammar.SimpleWordListGrammar"> <property name="path" value="../performance/tidigits/tidigits.wordlist"/> <property name="isLooping" value="true"/> ... </component>
You will also need a simple text file listing all the words in your grammar, one word per line. The above setting assumes that this text file is called "tidigits.wordlist".
Set 'isLooping' to 'true' only if you want to be able to repeat the words within an utterance. For example, if you are building an isolated digits decoder, you should set that property to false. If you are building a connected digits decoder, you should set that property to true.
NOTE: To build and run demos with JSGF, you must setup JSAPI.
First of all, you need to write your grammar in terms of JSGF.
Several JSGF examples (the files ending in ".gram") exists in the
jsgf
directory. A complete
specification is found at the
Java Speech Grammar Format Specification.
To set up a JSGF grammar test, specify in the configuration file:
<component name="flatLinguist" type="edu.cmu.sphinx.linguist.flat.FlatLinguist"> <property name="grammar" value="jsgfGrammar"/> ... </component> <component name="jsgfGrammar" type="edu.cmu.sphinx.jsapi.JSGFGrammar"> <property name="grammarLocation" value="file:./"/> <property name="dictionary" value="dictionary"/> <property name="grammarName" value="jsgf.hello"/> <property name="logMath" value="logMath"/> </component>
All JSGF grammar files end with ".gram". When you specify the URL of the directory where the JSGF grammar files are (i.e., the 'grammarLocation'), the grammar files will be discovered automatically. Therefore, in addition to the directory, you only need to specify the name of the grammar you want to use, which in this case is 'jsgf.hello'.
Currently, only N-gram grammars generated by the CMU-Cambridge Statistical Language Modeling Toolkit are supported by Sphinx-4. Both ASCII and binary versions are supported. Suppose your ASCII language model file is called "bigram.txt" and is located in the current directory, specify in the configuration file:
<component name="lexTreeLinguist" type="edu.cmu.sphinx.linguist.lextree.LexTreeLinguist"> <property name="languageModel" value="bigramModel"/> ... </component> <component name="bigramModel" type="edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel"> <property name="location" value="bigram.txt"/> <property name="maxDepth" value="2"/> <property name="unigramWeight" value=".7"/> ... </component>
Suppose your binary language model file is at "/usr/sphinx4/bigram.binary", specify in the configuration file:
<component name="bigramModel" type="edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel"> <property name="location" value="file:/usr/sphinx4/bigram.binary"/> <property name="maxDepth" value="2"/> <property name="unigramWeight" value=".7"/> ... </component>