How to Use Models from SphinxTrain in Sphinx-4 |
In order to use models trained from SphinxTrain, you need to package them into a JAR file. The advantage of having it in a JAR file is that the JAR file can simply be included in the classpath and referenced in the configuration file for it to be used in a Sphinx-4 application.
The Sphinx-4 build.xml script contains ANT targets that let you easily convert SphinxTrain models to a JAR file. We will walk you through the process using a model called "TOY" as an example. We will show the process to make them the TOY models usable in Sphinx-4. Suppose that the following TOY model files are created by SphinxTrain:
cd_continuous_8gau/means cd_continuous_8gau/mixture_weights cd_continuous_8gau/variances cd_continuous_8gau/transition_matrices dict/cmudict.0.6d dict/fillerdict etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.ci.mdef
These are the steps to make the TOY models trained usable in Sphinx-4. Note that very often errors are created by typos, so take great care when editing the various files.
All model files should be placed under the "sphinx4/models/acoustic" directory. For the TOY models, we create the directory "toy" under "sphinx4/models/acoustic":
sphinx4> cd models/acoustic sphinx4/models/acoustic> mkdir toy
Copy all the model files to sphinx4/models/acoustic/toy/. After copying all the model files, the resulting sphinx4/models/acoustic/toy/ directory looks like:
cd_continuous_8gau/ cd_continuous_8gau/means cd_continuous_8gau/mixture_weights cd_continuous_8gau/variances cd_continuous_8gau/transition_matrices dict/ dict/cmudict.0.6d dict/fillerdict etc/ etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.ci.mdefNote that all the files under "cd_continuous_8gau" are binary files in this example.
Create a text file called model.props
under
sphinx4/models/acoustic/toy/. This file must have the following properties:
description = TOY acoustic models modelClass = edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model modelLoader = edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader dataLocation = cd_continuous_8gau modelDefinition = etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef isBinary = true featureType = 1s_c_d_dd vectorLength = 39 sparseForm = false numberFftPoints = 512 numberFilters = 40 gaussians = 8 minimumFrequency = 130 maximumFrequency = 6800 sampleRate = 16000
Explanation of the properties:
description | a description of the acoustic model, which is
"TOY acoustic models" in this example |
modelClass | should be set to
"edu.cmu.sphinx.model.acoustic.${MEANINGFUL_NAME}.Model" .
${MEANINGFUL_NAME} usually contains the following information:
|
modelLoader | similar to "modelClass", it should be set to
"edu.cmu.sphinx.model.acoustic.${MEANINGFUL_NAME}.ModelLoader" |
dataLocation | The directory where all the model data files are, which in this example
is the directory "cd_continuous_8gau" . This is the
location with respect to the modelLoader class inside the final JAR file. |
modelDefinition | The location of the .mdef file, which in this example is
"etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef"
This is the location with respect to the modelLoader class inside
the final JAR file. |
isBinary | whether the model files (i.e., the means, variances, mixture_weights and transition_matrices files) are binary or ascii |
featureType | The SphinxTrain name for type of feature generated from the training data,
the name here is 1s_c_d_dd , which means the cepstra,
the delta cepstra, and the double delta of the cepstra. Currently,
only models trained from 1s_c_d_dd and
s3_1x39 features are supported by Sphinx-4. |
vectorLength | the length of the feature vector, which is usually 39 |
sparseForm | whether the transition matrices of the acoustic model is in sparse form, i.e., omitting the zeros of the non-transitioning states. |
numberFftPoints | the number of FFT points used when creating features for training |
numberFilters | the number of filters used when creating features for training |
gaussians | the number of Gaussians of the generated models |
maximumFrequency | the maximum frequency of the mel filters used when creating features for training |
minimumFrequency | the minimum frequency of the mel filters used when creating features for training |
sampleRate | the sample rate of the training data |
These properties will be printed out if you run the actual JAR file that was created in step 8, for example:
sphinx4> java -jar lib/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar TOY acoustic models dataLocation: cd_continuous_8gau description: TOY acoustic models featureType: 1s_c_d_dd gaussians: 8 isBinary: true maximumFrequency: 6800 minimumFrequency: 130 modelClass: edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model modelDefinition: etc/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.4000.mdef modelLoader: edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader numberFftPoints: 512 numberFilters: 40 sampleRate: 16000 sparseForm: false vectorLength: 39
This will help whoever uses your acoustic model to specify the values in their configuration file correctly. Each line displayed here is a line in the model.props file.
Modify build.xml, which is the ANT script that creates the acoustic model JAR files. First define properties for the name of your acoustic model and the directory in which your acoustic model data is in. The name of your acoustic model should be the ${MEANINGFUL_NAME}, which in our example is "TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz". Therefore, we will add the following in the section of build.xml that says "For generating the WSJ...":
<property name="toy_name" value="TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz"/> <property name="toy_data_dir" value="models/acoustic/toy"/>
Search for the ANT target "create_all_model_classes". Add lines after the last "antcall" to create your model classes. In our example, we will add the following lines:
<antcall target="create_my_model_classes"> <param name="my_model_name" value="${toy_name}"/> </antcall>
This is the clean up step. Search for the ANT target "delete_all_model_classes". Add lines after the last "antcall" to delete your model classes. In our example, we will add the following lines:
<antcall target="delete_my_model_classes"> <param name="my_model_name" value="${toy_name}"/> </antcall>
Search for the ANT target "create_all_models". Add lines after the last "antcall" to create your models. In our example, we will add the following lines:
<antcall target="create_my_model"> <param name="my_model_data_dir" value="${toy_data_dir}"/> <param name="my_model_name" value="${toy_name}"/> </antcall>This is the last step in the editing of the build.xml file.
At the top level directory, type "ant". This should build all the acoustic model JAR files, which will be found in the "lib" directory.
In your Sphinx-4 configuration file, you usually need to specify the acoustic model in two places: the acoustic model and the dictionary. For example, the acoustic model should be specified as:
<component name="toy" type="edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model"> <property name="loader" value="sphinx3Loader"/> <property name="unitManager" value="unitManager"/> </component> <component name="sphinx3Loader" type="edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.ModelLoader"> <property name="logMath" value="logMath"/> <property name="unitManager" value="unitManager"/> </component>
There is an example of unitManager
in most config files.
Note that edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
is the class of your acoustic model, which is in the final
JAR file. If you include the JAR file in your CLASSPATH, Java will find it.
The dictionary file is usually packaged within the acoustic model JAR file.
Inside the JAR file, the cmudict.0.6d
is located at
/edu/cmu/sphinx/model/acoustic/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/
. Inside the configuration file, it should be specified as:
<component name="dictionary" type="edu.cmu.sphinx.linguist.dictionary.FullDictionary"> <property name="dictionaryPath" value="resource:/edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model!/edu/cmu/sphinx/model/acoustic/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d"/> ... </component>
What the line value="resource..."
means is that the dictionary
is located at the resource where the edu.cmu.sphinx.model.acoustic.TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.Model
class is, which is the acoustic
model JAR file. The dictionary is located at /edu/cmu/sphinx/model/acoustic/TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz/dict/cmudict.0.6d
inside that
resource (i.e., the acoustic model JAR file). Likewise for the "fillerPath"
property.
Finally, remember to include the model JAR file in your Java CLASSPATH, which in our example is TOY_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar.