How to Use Models from SphinxTrain in Sphinx-4

In order to use models trained from SphinxTrain, you need to package them into a JAR file. The advantage of having it in a JAR file is that the JAR file can simply be included in the classpath and referenced in the configuration file for it to be used in a Sphinx-4 application.

The Sphinx-4 build.xml script contains ANT targets that let you easily convert SphinxTrain models to a JAR file. We will walk you through the process using a model called "TOY" as an example. We will show the process to make them the TOY models usable in Sphinx-4. Suppose that the following TOY model files are created by SphinxTrain:

cd_continuous_8gau/means
cd_continuous_8gau/mixture_weights
cd_continuous_8gau/variances
cd_continuous_8gau/transition_matrices
dict/cmudict.0.6d
dict/fillerdict
etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef
etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.ci.mdef

These are the steps to make the TOY models trained usable in Sphinx-4. Note that very often errors are created by typos, so take great care when editing the various files.

  1. Create Model Directory

    All model files should be placed under the "sphinx4/models/acoustic" directory. For the TOY models, we create the directory "toy" under "sphinx4/models/acoustic":

    sphinx4> cd models/acoustic
    sphinx4/models/acoustic> mkdir toy
    

  2. Copy Model Files

    Copy all the model files to sphinx4/models/acoustic/toy/. After copying all the model files, the resulting sphinx4/models/acoustic/toy/ directory looks like:

    cd_continuous_8gau/
    cd_continuous_8gau/means
    cd_continuous_8gau/mixture_weights
    cd_continuous_8gau/variances
    cd_continuous_8gau/transition_matrices
    dict/
    dict/cmudict.0.6d
    dict/fillerdict
    etc/
    etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef
    etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.ci.mdef
    
    Note that all the files under "cd_continuous_8gau" are binary files in this example.

  3. Create model.props

    Create a text file called model.props under sphinx4/models/acoustic/toy/. This file must have the following properties:

    description = TOY acoustic models
    modelClass = edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
    modelLoader = edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader
    dataLocation = cd_continuous_8gau
    modelDefinition = etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef
    
    isBinary = true
    featureType = 1s_c_d_dd
    vectorLength = 39
    sparseForm = false
    
    numberFftPoints = 512
    numberFilters = 40
    gaussians = 8
    minimumFrequency = 130
    maximumFrequency = 6800
    sampleRate = 16000
    
    

    Explanation of the properties:

    description a description of the acoustic model, which is "TOY acoustic models" in this example
    modelClass should be set to "edu.cmu.sphinx.model.acoustic.${MEANINGFUL_NAME}.Model". ${MEANINGFUL_NAME} usually contains the following information:
    1. the name of the data set used to train the models (TOY)
    2. the number of gaussians (8gau)
    3. the number of cepstra data points (13dCep)
    4. the sample rate of the training data (16k)
    5. the number of mel filters (40mel)
    6. the minimum frequency (130Hz)
    7. the maximum frequency (6800Hz)
    This ${MEANINGFUL_NAME} will be the name of your JAR file as well. In this example, the created JAR file will be called TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar.
    modelLoader similar to "modelClass", it should be set to "edu.cmu.sphinx.model.acoustic.${MEANINGFUL_NAME}.ModelLoader"
    dataLocation The directory where all the model data files are, which in this example is the directory "cd_continuous_8gau". This is the location with respect to the modelLoader class inside the final JAR file.
    modelDefinition The location of the .mdef file, which in this example is "etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef" This is the location with respect to the modelLoader class inside the final JAR file.
    isBinary whether the model files (i.e., the means, variances, mixture_weights and transition_matrices files) are binary or ascii
    featureType The SphinxTrain name for type of feature generated from the training data, the name here is 1s_c_d_dd, which means the cepstra, the delta cepstra, and the double delta of the cepstra. Currently, only models trained from 1s_c_d_dd and s3_1x39 features are supported by Sphinx-4.
    vectorLength the length of the feature vector, which is usually 39
    sparseForm whether the transition matrices of the acoustic model is in sparse form, i.e., omitting the zeros of the non-transitioning states.
    numberFftPoints the number of FFT points used when creating features for training
    numberFilters the number of filters used when creating features for training
    gaussians the number of Gaussians of the generated models
    maximumFrequency the maximum frequency of the mel filters used when creating features for training
    minimumFrequency the minimum frequency of the mel filters used when creating features for training
    sampleRate the sample rate of the training data

    These properties will be printed out if you run the actual JAR file that was created in step 8, for example:

    sphinx4> java -jar lib/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar
    
    TOY acoustic models
    
    dataLocation: cd_continuous_8gau
    description: TOY acoustic models
    featureType: 1s_c_d_dd
    gaussians: 8
    isBinary: true
    maximumFrequency: 6800
    minimumFrequency: 130
    modelClass: edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
    modelDefinition: etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef
    modelLoader: edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader
    numberFftPoints: 512
    numberFilters: 40
    sampleRate: 16000
    sparseForm: false
    vectorLength: 39
    

    This will help whoever uses your acoustic model to specify the values in their configuration file correctly. Each line displayed here is a line in the model.props file.


  4. Add ANT Properties in build.xml

    Modify build.xml, which is the ANT script that creates the acoustic model JAR files. First define properties for the name of your acoustic model and the directory in which your acoustic model data is in. The name of your acoustic model should be the ${MEANINGFUL_NAME}, which in our example is "TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz". Therefore, we will add the following in the section of build.xml that says "For generating the WSJ...":

    <property name="toy_name" value="TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz"/>
    <property name="toy_data_dir" value="models/acoustic/toy"/>
    

  5. Create Model Classes in build.xml

    Search for the ANT target "create_all_model_classes". Add lines after the last "antcall" to create your model classes. In our example, we will add the following lines:

    <antcall target="create_my_model_classes">
       <param name="my_model_name" value="${toy_name}"/>
    </antcall>
    

  6. Delete Model Classes in build.xml

    This is the clean up step. Search for the ANT target "delete_all_model_classes". Add lines after the last "antcall" to delete your model classes. In our example, we will add the following lines:

    <antcall target="delete_my_model_classes">
        <param name="my_model_name" value="${toy_name}"/>
    </antcall>
    

  7. Create Model JAR in build.xml

    Search for the ANT target "create_all_models". Add lines after the last "antcall" to create your models. In our example, we will add the following lines:

    <antcall target="create_my_model">
        <param name="my_model_data_dir" value="${toy_data_dir}"/>
        <param name="my_model_name" value="${toy_name}"/>
    </antcall>
    
    This is the last step in the editing of the build.xml file.

  8. Rebuild

    At the top level directory, type "ant". This should build all the acoustic model JAR files, which will be found in the "lib" directory.


  9. Specify Model Class in Config File

    In your Sphinx-4 configuration file, you usually need to specify the acoustic model in two places: the acoustic model and the dictionary. For example, the acoustic model should be specified as:

    <component name="toy" 
               type="edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model">
        <property name="loader" value="sphinx3Loader"/>
        <property name="unitManager" value="unitManager"/>
    </component>
    
    <component name="sphinx3Loader"
               type="edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader">
        <property name="logMath" value="logMath"/>
        <property name="unitManager" value="unitManager"/>
    </component>
    

    There is an example of unitManager in most config files. Note that edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model is the class of your acoustic model, which is in the final JAR file. If you include the JAR file in your CLASSPATH, Java will find it.

    The dictionary file is usually packaged within the acoustic model JAR file. Inside the JAR file, the cmudict.0.6d is located at /edu/cmu/sphinx/model/acoustic/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/. Inside the configuration file, it should be specified as:

    <component name="dictionary"
    	   type="edu.cmu.sphinx.linguist.dictionary.FullDictionary">
        <property name="dictionaryPath"
                  value="resource:/edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d"/>
        ...
    </component>
    

    What the line value="resource..." means is that the dictionary is located at the resource where the edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model class is, which is the acoustic model JAR file. The dictionary is located at /edu/cmu/sphinx/model/acoustic/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d inside that resource (i.e., the acoustic model JAR file). Likewise for the "fillerPath" property.


  10. Include JAR File in Classpath

    Finally, remember to include the model JAR file in your Java CLASSPATH, which in our example is TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar.