Sphinx-4 Live Demo

This is a very simple program that shows the basic speech recognition capabilities of Sphinx-4. When you run the program, the following GUI shows up:

The program prompts you to say a certain sentence (here it prompts you to say "the left most and closest"). You can select the different tasks (e.g., isolated digits, connected digits, spelling) in the "Decoder:" selection box. The text you should say is in the "Say:" box. The recognition results are displayed in the "Recognized:" box.

Building

First of all, make sure that you have JSAPI setup correctly. Then make sure that all the Sphinx-4 classes are built. Go to the top level directory (sphinx4), and type:

ant

To build this demo, type the following in this directory (tests/live/):

ant

Running

You can run this program in three different modes:

NOTE:

  1. For this demo, you will NOT need to change the location of the acoustic model and dictionary in the properties files. The location specified in the properties files is already correct.
  2. If you are running Linux and have problems with the audio, please read the Linux JavaSound section.

Customizing the Demo

All the tests that the live demo program will run are listed at the "decoders.list" file in this directory (tests/live/). Using the isolated digits test as an example, a test is specified as follows:

isolatedDigits.name = Isolated Digits
isolatedDigits.propertiesFile = ./ti46.props
isolatedDigits.testFile = ./isolatedDigits.test

name - this is the name you will see in the "Decoder:" box in the live program
propertiesFile - the SphinxProperties file for this test
testFile - a list of the sentences the live demo should prompt the user to say

You can create your own demo using different types of grammars:

The only difference in setup between the three types of grammars is in the configuration file. Lets look at each in turn:

Word List Grammar

To set up a word list grammar test, in the configuration file, specify:

<component name="flatLinguist" 
	   type="edu.cmu.sphinx.linguist.flat.FlatLinguist">
    <property name="grammar" value="wordListGrammar"/>
    ...
</component>

<component name="wordListGrammar" 
    type="edu.cmu.sphinx.linguist.language.grammar.SimpleWordListGrammar">
    <property name="path" value="../performance/tidigits/tidigits.wordlist"/>
    <property name="isLooping" value="true"/>
    ...
</component>

You will also need a simple text file listing all the words in your grammar, one word per line. The above setting assumes that this text file is called "tidigits.wordlist".

Set 'isLooping' to 'true' only if you want to be able to repeat the words within an utterance. For example, if you are building an isolated digits decoder, you should set that property to false. If you are building a connected digits decoder, you should set that property to true.

JSGF Grammar

NOTE: To build and run demos with JSGF, you must setup JSAPI.

First of all, you need to write your grammar in terms of JSGF. Several JSGF examples (the files ending in ".gram") exists in the jsgf directory. A complete specification is found at the Java Speech Grammar Format Specification.

To set up a JSGF grammar test, specify in the configuration file:

<component name="flatLinguist" 
	   type="edu.cmu.sphinx.linguist.flat.FlatLinguist">
    <property name="grammar" value="jsgfGrammar"/>
    ...
</component>

<component name="jsgfGrammar" type="edu.cmu.sphinx.jsapi.JSGFGrammar">
    <property name="grammarLocation" value="file:./"/>
    <property name="dictionary" value="dictionary"/>
    <property name="grammarName" value="jsgf.hello"/>
    <property name="logMath" value="logMath"/>
</component>

All JSGF grammar files end with ".gram". When you specify the URL of the directory where the JSGF grammar files are (i.e., the 'grammarLocation'), the grammar files will be discovered automatically. Therefore, in addition to the directory, you only need to specify the name of the grammar you want to use, which in this case is 'jsgf.hello'.

N-Gram Grammars

Currently, only N-gram grammars generated by the CMU-Cambridge Statistical Language Modeling Toolkit are supported by Sphinx-4. Both ASCII and binary versions are supported. Suppose your ASCII language model file is called "bigram.txt" and is located in the current directory, specify in the configuration file:

<component name="lexTreeLinguist" 
    type="edu.cmu.sphinx.linguist.lextree.LexTreeLinguist">
    <property name="languageModel" value="bigramModel"/>
    ...
</component>

<component name="bigramModel" 
    type="edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel">
    <property name="location" value="bigram.txt"/>
    <property name="maxDepth" value="2"/>
    <property name="unigramWeight" value=".7"/>
    ...
</component>

Suppose your binary language model file is at "/usr/sphinx4/bigram.binary", specify in the configuration file:

<component name="bigramModel" 
    type="edu.cmu.sphinx.linguist.language.ngram.large.LargeTrigramModel">
    <property name="location" value="file:/usr/sphinx4/bigram.binary"/>
    <property name="maxDepth" value="2"/>
    <property name="unigramWeight" value=".7"/>
    ...
</component>