Instrumentation for Sphinx-4



Introduction

Sphinx-4 can be configured to output various collections of information that may be useful for users and developers. This information includes:
The output of the various types of instrumentation information is controllable from the configuration file.  Lets look in detail  at what information is being displayed and how to control what information is output.

Silence is Golden

First lets look at a Sphinx-4 configuration file for a tidigits task.  (You can learn more about configuration files by reading Sphinx-4 Configuration Management).  The Sphinx-4 configuration file silent.config.xml shows a standard configuration for recognizing connected digits.  It is based upon the tidigits.config.xml found in sphinx4/tests/performance/tidigits, except that all logging and instrumentation has been disabled.  If we run Sphinx-4 with this configuration we get absolutely no output:

% java edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch
%
That is probably not very useful for most applications.  Let's take a look to see if we can get some recognition results using the Logger.

Using the Logger

Any well behaved Sphinx components (there are some that are not so well behaved...) that needs to output informational messages will do so via the Sphinx-4 logger.  These have a level of importance associated with them. Some messages indicate severe  problems, some messages are  warnings, some are informational,  and some are fine  level tracing messages. The complete set of log levels are:
In  silent.config.xml   there is a global property called logLevel that is set to OFF. 
<config>
<property name="logLevel" value="OFF"/>
<!-- components omitted -->
</config>
This indicates that by default, no logging information will be logged to the console at all.  This, of course, is dangerous because we probably want at least  to see all warning and error messages.  Let's turn on warning and error messages. We do this by setting the logLevel to WARNING like so:
<config>
<property name="logLevel" value="WARNING"/>
<!-- components omitted -->
</config>
By setting the logLevel to WARNING, we are saying that we want to see all log messages at the WARNING level or higher. With this setting we should see WARNING and SEVERE messages.  (Note that this is the default setting anyway, so if you omit setting logLevel at the global level, the logLevel is automatically set to WARNING).

Let's run this again with our new settings:
% java edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch
%
It is still silent, which means we don't have any warning or errors in our run.  Now, lets see what an error looks like. To force an error, I'll delete one of the audio input files listed in the tidigits.batch file. This should cause an error when the recognizer attempts to deocode the  missing file .  Here's an example:
% java edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch

07:37.604 SEVERE I/O error during decoding: /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.o5o6671a.wav.raw\
 (No such file or directory) in edu.cmu.sphinx.tools.batch.BatchModeRecognizer:decode
%
This time we get a SEVERE error report showing when and where the error occurred. Note that the log includes information such as the timetamp for the error, the level of the error, a detailed error message and an indication of where in the code the error occurred.

Now let's restore the missing file so we don't get this error anymore and try to get some results displayed.

The JavaDocs for the BatchModeRecognizer  indicate that the BatchModeRecognizer will log results at the INFO level.  Let's try setting the logLevel to INFO to see what the BatchModeRecognizer reports.
<config>
<property name="logLevel" value="INFO"/>
<!-- components omitted -->
</config>
By setting the logLevel to INFO we are enabling logs at the INFO, WARNING and SEVERE levels.

With this new setting lets run the recognizer again to see what output we get:
% java edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch
08:23.006 INFO logMath Log base is 1.0001
08:23.020 INFO logMath Using AddTable when adding logs
08:23.021 INFO logMath LogAdd table has 99022 entries.
08:23.683 INFO sphinx3Loader Sphinx3Loader
08:23.684 INFO sphinx3Loader Pool wd_dependent_phone.cd_continuous_8gau/means Entries: 4816
08:23.686 INFO sphinx3Loader Pool wd_dependent_phone.cd_continuous_8gau/variances Entries: 4816
08:23.687 INFO sphinx3Loader Pool wd_dependent_phone.cd_continuous_8gau/transition_matrices Entries: 34
08:23.688 INFO sphinx3Loader Pool senones Entries: 602
08:23.689 INFO sphinx3Loader Pool meanTransformationMatrix Entries: 1
08:23.690 INFO sphinx3Loader Pool meanTransformationMatrix Entries: 1
08:23.691 INFO sphinx3Loader Pool varianceTransformationMatrix Entries: 1
08:23.692 INFO sphinx3Loader Pool varianceTransformationMatrix Entries: 1
08:23.693 INFO sphinx3Loader Pool wd_dependent_phone.cd_continuous_8gau/mixture_weights Entries: 602
08:23.694 INFO sphinx3Loader Pool senones Entries: 602
08:23.696 INFO sphinx3Loader Context Independent Unit Entries: 34
08:23.697 INFO sphinx3Loader HMM Manager: 430 hmms
08:23.698 INFO acousticModel CompositeSenoneSequences: 0
08:23.700 INFO dictionary Loading dictionary from:
08:23.701 INFO dictionary file:/lab/speech/sphinx4/data/tidigits_8gau_13dCep_16k_40mel_130Hz_6800Hz.bin.zip/dictionary
08:23.712 INFO dictionary Loading filler dictionary from:
08:23.714 INFO dictionary file:/lab/speech/sphinx4/data/tidigits_8gau_13dCep_16k_40mel_130Hz_6800Hz.bin.zip/fillerdict
08:23.728 INFO wordListGrammar Num nodes : 14
08:23.729 INFO wordListGrammar Num arcs : 34
08:23.731 INFO wordListGrammar Avg arcs : 2.4285715
08:23.306 INFO threadedScorer # of scoring threads: 1
08:23.393 INFO batch BatchDecoder: decoding files in tidigits.batch
08:23.173 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.111a.wav.raw
08:23.175 INFO batch Result: <sil> one one one
08:23.645 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.139oa.wav.raw
08:23.647 INFO batch Result: <sil> one three nine oh
08:24.957 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.155a.wav.raw
08:24.958 INFO batch Result: <sil> one five five
08:24.278 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1688a.wav.raw
08:24.279 INFO batch Result: <sil> one six eight eight
08:24.987 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1a.wav.raw
08:24.988 INFO batch Result: <sil> one

It looks like there are a number of other components that are issuing INFO messages that are cluttering up our output.  We'd like to be able to turn the other INFO messages off, and just get the BatchModeRecognizer INFO messages.  We can do this by setting the logLevel at the individual component level.  Each component can have its own individual logging level.  This means that different components can be logging messages at different levels.  Since we only want the BatchModeRecognizer to be outputing INFO messages, lets restore the overall logging level to WARNING and set the logLevel for 'batch' (the name of the BatchModeRecognizer component) to INFO.


<config>
<property name="logLevel" value="INFO"/>

<component name="batch"
type="edu.cmu.sphinx.tools.batch.BatchModeRecognizer">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<property name="inputSource" value="streamDataSource"/>
<property name="logLevel" value="INFO"/>
</component>


<!-- many components omitted -->
</config>
Now lets look at out output:

% java -cp ../../../bld/classes/ -Dskip=20 edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch
08:26.591 INFO batch BatchDecoder: decoding files in tidigits.batch
08:26.260 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.111a.wav.raw
08:26.262 INFO batch Result: <sil> one one one
08:26.749 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.139oa.wav.raw
08:26.751 INFO batch Result: <sil> one three nine oh
08:26.105 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.155a.wav.raw
08:26.107 INFO batch Result: <sil> one five five
08:26.390 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1688a.wav.raw
08:26.391 INFO batch Result: <sil> one six eight eight
08:26.022 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1a.wav.raw
08:26.023 INFO batch Result: <sil> one
08:26.029 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1b.wav.raw
08:26.030 INFO batch Result: <sil> one
08:26.048 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1za.wav.raw
08:26.049 INFO batch Result: <sil> one zero


There are ways to control the terseness of the actual output as well.  Setting the global property logTerse to true, will result in the ancillary information (timestamp, level, source component) being omitted.

<config>
<property name="logLevel" value="INFO"/>

<component name="batch"
type="edu.cmu.sphinx.tools.batch.BatchModeRecognizer">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<property name="inputSource" value="streamDataSource"/>
<property name="logLevel" value="INFO"/>
<property name="logTerse" value="true"/>
</component>

</config>
Here's the terse output:

% java -cp ../../../bld/classes/ -Dskip=20 edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch
Handler java.util.logging.ConsoleHandler@cdfc9c
BatchDecoder: decoding files in tidigits.batch
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.111a.wav.raw
Result: <sil> one one one
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.139oa.wav.raw
Result: <sil> one three nine oh
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.155a.wav.raw
Result: <sil> one five five
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1688a.wav.raw
Result: <sil> one six eight eight
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1a.wav.raw
Result: <sil> one
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1b.wav.raw
Result: <sil> one
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1za.wav.raw
Result: <sil> one zero

At this point we know enough about the logger to be able to turn it on and off, to control the level of logging output on a per component basis and to configure the appearance of the logging output.

Tracking Accuracy

Now lets look at how we can track the accuracy performance of Sphinx-4.  One of the prime methods of measuring the overall quality of a speech recognition system is the recognition accuracy.  This statistic shows how well the sentence hypotheses produced by the recognizer match the actual transcripts of what was spoken.  Obviously, recognition accuracy can only be reported when the transcripts are available as well.  All of the Sphinx-4 performance tests (found under the Sphinx-4/tests/performance directory) include transcripts. For instance, the batch file tidigits.batch begins like so:

/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.111a.wav.raw one one one
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.139oa.wav.raw one three nine oh
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.155a.wav.raw one five five
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1688a.wav.raw one six eight eight
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1a.wav.raw one
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1b.wav.raw one
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1za.wav.raw one zero
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.24z982za.wav.raw two four
Each line represents a single utterance. The first entry on each line contains the path name to the audio that is to be recognized. The remaining entries are the words that make up the transcript for the utterance.  Using this information the BatchModeRecognizer can make available the transcripts necessary for producing accuracy statistics.

The accuracy tracker is a component that is typically added to the set of monitors for a recognizer.  The accuracy tracker will monitor the recognizer, and when the recognizer generates a result, the tracker will compare the resulting hypothesis to the appropriate transcript and generate the statistics.

Let's configure our system now to include an accuracy tracker.  First we add an entry for the component itself:
    <component name="accuracyTracker"
type="edu.cmu.sphinx.instrumentation.AccuracyTracker">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<property name="showAlignedResults" value="false"/>
<property name="showRawResults" value="false"/>
</component>


Next, we add the accuracy tracker to the set of recognizer monitors like so:
    <component name="connectedDigitsRecognizer"
type="edu.cmu.sphinx.recognizer.Recognizer">
<property name="decoder" value="digitsDecoder"/>
<propertylist name="monitors">
<item>accuracyTracker </item>
</propertylist>
</component>


Also, since the accuracy tracker will ouput results, we can turn off the output of the results by the 'batch' component by reseting the logLevel  setting to WARNING.

Here's the output:
% java -cp ../../../bld/classes/ -Dskip=20 edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch
(... many lines omitted)

REF: four one six
HYP: four one six
ALIGN_REF: four one six
ALIGN_HYP: four one six
RAW <sil> four one six

Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
Words: 78 Matches: 78 WER: 0.000%
Sentences: 23 Matches: 23 SentenceAcc: 100.000%

REF: four two eight oh oh oh nine
HYP: four two eight oh oh nine
ALIGN_REF: four two eight oh oh OH nine
ALIGN_HYP: four two eight oh oh ** nine
RAW <sil> four two eight oh oh nine

Accuracy: 98.824% Errors: 1 (Sub: 0 Ins: 0 Del: 1)
Words: 85 Matches: 84 WER: 1.176%
Sentences: 24 Matches: 23 SentenceAcc: 95.833%

REF: four five two zero three
HYP: four five two zero three
ALIGN_REF: four five two zero three
ALIGN_HYP: four five two zero three
RAW <sil> four five two zero three

As you can see the accuracy tracker outputs quite a bit of information.  Lets look at the information in detail:
REF
Reference - This is the reference or transcript. This is what should be recognized.
HYP
Hypothesis - This is the result that is generated by the recognizer. This is what was  recognized.
ALIGN_REF
Aligned Reference - This is the reference text, where mismatches between the reference and the hypothesis are highlighted. 
ALIGH_HYP
Aligned Hypothesis -  This is the recognized text with mismatched text highlighted.
RAW
Raw Text - this is the actual text recognized, including all filler words such as silences, coughs, lip smacks, breaths and so on.
Accuracy
Word Accuracy - The number of matching words compared to the total number of words in the input as a percent.
Errors:
Word Error Count - The total number of word errors.
Sub
Substition count - The total number of substitution errors. A substitution error occurs when one word is replaced by another.
Ins
Insertion count - The total number of insertion errors. An insertion error occurs when an extra word is inserted in the hypothesis.
Del
Deletion count - The total number of deletion errors. A deletion error occurs when a word is missing in the hypothesis.
Words
Reference word count - The total number of words expected
Matches
Matching word count - The total number of matching words
WER
Word error rate - This is equal to (sub + ins + del) / words * 100
Sentences
Reference sentence count - The total number of sentences.
Matches
Matching sentences - The total number of matching sentences
SentenceAcc
Sentence Accuracy  - This is equal to (matches / sentences) * 100

First it shows the REF and HYP outputs.  REF is the reference or transcript. This is the expected result. HYP is the hypothesis, the result that was generated by the recognizer.

That's a whole lot of stuff, in fact, it is probably more than we need. We can configure the accuracy tracker  (of course) to reduce the amount of output.  Let's turn off the ALIGN and the RAW outputs:
    <component name="accuracyTracker"
type="edu.cmu.sphinx.instrumentation.AccuracyTracker">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<property name="showAlignedResults" value="false"/>
<property name="showRawResults" value="false"/>
</component>
The accuracy tracker will also show summary information at the end of a run (when the recognizer is deallocated).   Here's an example showing the reduced out and the summary information.


REF: one
HYP: one

Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
Words: 15 Matches: 15 WER: 0.000%
Sentences: 5 Matches: 5 SentenceAcc: 100.000%

# --------------- Summary statistics ---------
Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
Words: 15 Matches: 15 WER: 0.000%
Sentences: 5 Matches: 5 SentenceAcc: 100.000%



The summary statistics shows the total accuracy data for the entire run.

Tracking Speed

Another important aspect of speech recognition is the speed of recognition.  The speed tracker will track and report statistics relating to the speed of recognition.  The speed tracker is added to the set of monitors in the recognizer in the same way that the accuracy tracker is added:
    <component name="connectedDigitsRecognizer"
type="edu.cmu.sphinx.recognizer.Recognizer">
<property name="decoder" value="digitsDecoder"/>
<propertylist name="monitors">
<item>accuracyTracker </item>
<item>speedTracker </item>
</propertylist>
</component>

<component name="speedTracker"
type="edu.cmu.sphinx.instrumentation.SpeedTracker">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<property name="frontend" value="${frontend}"/>
</component>

Here's some output of the speed tracker:
REF:       one one one
HYP: one one one

Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
Words: 3 Matches: 3 WER: 0.000%
Sentences: 1 Matches: 1 SentenceAcc: 100.000%
This Time Audio: 1.38s Proc: 2.16s Speed: 1.56 X real time
Total Time Audio: 1.38s Proc: 2.16s Speed: 1.56 X real time

REF: one three nine oh
HYP: one three nine oh

Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
Words: 7 Matches: 7 WER: 0.000%
Sentences: 2 Matches: 2 SentenceAcc: 100.000%
This Time Audio: 1.47s Proc: 0.97s Speed: 0.66 X real time
Total Time Audio: 2.85s Proc: 3.13s Speed: 1.10 X real time



The data output by the speed tracker are:
This time audio
The length of  time (in seconds) of the current audio.
This time proc
The time spent processing this audio.
This Speed:
processing time / audio time
Total time audio
The time for all audio
Total processing The time spent processing all audio
Total Speed:
total proc time / total audio time


Dumping Response Time
The speed tracker can also be configured to show response time. This is useful when running in a live-mode situation where front-end buffering latency can affect the perceived performance of the system.
The speed tracker configuration for enabling tracking of response time is shown here:

    <component name="speedTracker"
type="edu.cmu.sphinx.instrumentation.SpeedTracker">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<property name="frontend" value="${frontend}"/>
<property name="showResponseTime" value="true"/>
</component>
The response time output looks like this:

HYP: one three nine oh
Sentences: 2
This Time Audio: 1.15s Proc: 0.86s Speed: 0.75 X real time
Total Time Audio: 3.33s Proc: 3.52s Speed: 1.06 X real time
Response Time: Avg: 0.032333333s Max: 0.085s Min: 0.0060s

The response time field includes and average (Avg), maximum(Max) and minimum (Min) response time encountered. This is the time from when the front-end first encounters a packet of audio, until it is delivered to the decoding portion of the recognizer.  This gives a good measure of the latency due to the front-end processing such as normalization and end-pointing.

Dumping Timing Statistics
The speed tracker can also be configured to dump out low level timing data for various aspects of the recognition process.  Many of the components in the Sphinx-4 system will collect detailed timing statistics.  For instance, the linguist may keep track of how long it takes to build the search graph, and the acoustic model loader may keep track of how long it takes to load the acoustic model from a compressed file. 
Setting the speedTracker showTimers property to true will cause the timing information to be dump. The timing information is dumped immediately after the system is initialized, and again when the recognizer is deallocated.  Here's a sample of the timing output:

# ----------------------------- Timers----------------------------------------
# Name Count CurTime MinTime MaxTime AvgTime TotTime
streamDataSourc 196 0.0000s 0.0000s 0.0390s 0.0004s 0.0760s
premphasizer 196 0.0000s 0.0000s 0.0130s 0.0001s 0.0190s
windower 194 0.0010s 0.0000s 0.0550s 0.0009s 0.1840s
fft 1732 0.0000s 0.0000s 0.0530s 0.0003s 0.4780s
melFilterBank 1732 0.0000s 0.0000s 0.0410s 0.0000s 0.0790s
dct 1732 0.0000s 0.0000s 0.0280s 0.0001s 0.0920s
featureExtracti 1692 0.0000s 0.0000s 0.1610s 0.0001s 0.1980s
AM_Load 1 2.3060s 2.3060s 2.3060s 2.3060s 2.3060s
DictionaryLoad 1 0.0110s 0.0110s 0.0110s 0.0110s 0.0110s
compile 1 0.8750s 0.8750s 0.8750s 0.8750s 0.8750s
createGStates 1 0.0260s 0.0260s 0.0260s 0.0260s 0.0260s
collectContex 1 0.0050s 0.0050s 0.0050s 0.0050s 0.0050s
expandStates 1 0.7250s 0.7250s 0.7250s 0.7250s 0.7250s
connectNodes 1 0.0140s 0.0140s 0.0140s 0.0140s 0.0140s
scoring 1722 0.0000s 0.0000s 0.3980s 0.0037s 6.4570s
pruning 1712 0.0000s 0.0000s 0.0010s 0.0000s 0.0100s
growing 1722 0.0030s 0.0000s 0.0610s 0.0027s 4.666
This table shows the timing information after a short run of tidigits word list.   Here's the data key:

Name
The name of the operation
Count
The number of times the operation was invoked
CurTime
The most recent timing for this operation
MinTime
The fastest time for this operation
MaxTime
The slowest time for this operation
AvgTime
The average time for this operation
TotTime
The total time for this operation

Tracking Memory Usage

For some applications, the overall memory footprint of the recognizer is important.  The MemoryTracker can be used to track the memory usage of Sphinx-4.   The MemoryTracker is added to the set of monitors in the recognizer in the same way that the accuracy tracker is added:

    <component name="connectedDigitsRecognizer" 
type="edu.cmu.sphinx.recognizer.Recognizer">
<property name="decoder" value="digitsDecoder"/>
<propertylist name="monitors">
<item>accuracyTracker </item>
<item>speedTracker </item>
<item>memoryTracker </item>
</propertylist>
</component>

<component name="memoryTracker"
type="edu.cmu.sphinx.instrumentation.MemoryTracker">
<property name="recognizer" value="connectedDigitsRecognizer"/>
</component>


The output of the memory tracker is as follows:

REF:       one
HYP: one

Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
Words: 16 Matches: 16 WER: 0.000%
Sentences: 6 Matches: 6 SentenceAcc: 100.000%
This Time Audio: 0.99s Proc: 0.64s Speed: 0.65 X real time
Total Time Audio: 7.47s Proc: 6.13s Speed: 0.82 X real time
Mem Total: 126.62 Mb Free: 112.26 Mb
Used: This: 14.36 Mb Avg: 14.35 Mb Max: 18.82 Mb

The memory tracker ouputs five data items:

Mem total
The total amount of memory allocated to the VM
Free
Of the Mem Total how much is currently not being used
Used This
How much memory is currently being used
Used Avg
The average amount of memory used
Used Max The maximum amount of memory used

Miscellaneous Instrumentation

In addition to the previously described instrumentation, there are a few other monitors that are useful.

Configuration Monitor

The configuration monitor dumps out the current configuration of the system.  This dump differs from the configuration file in a few ways:
The Configuration Monitor (as well as most of the 'dump something interesting' monitors) are generally controlled by the RecognizerMonitor.   The Configuration Monitor defines what is to be dumped (in this case the configuration), while the RecognizerMonitor indicates when it should be dumped.  Let's configure a recognizer to dump the configuration after the recognizer is allocated (that is, the recognizer is completely initialized and ready to recognize).
<!-- add the recognizer monitor to the recognizer -->
<component name="connectedDigitsRecognizer"
type="edu.cmu.sphinx.recognizer.Recognizer">
<property name="decoder" value="digitsDecoder"/>
<propertylist name="monitors">
<item>accuracyTracker </item>
<item>speedTracker </item>
<item>memoryTracker </item>
<item>recognizerMonitor </item>
</propertylist>
</component>

<!-- create the recognizer monitor with the configMonitor as one of the dumpers -->

<component name="recognizerMonitor"
type="edu.cmu.sphinx.instrumentation.RecognizerMonitor">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<propertylist name="allocatedMonitors">
<item>configMonitor </item>
</propertylist>
</component>

<!-- create the configMonitor -->
<component name="configMonitor"
type="edu.cmu.sphinx.instrumentation.ConfigMonitor">
<property name="showConfig" value="true"/>
</component>




Here's a snippet of the output:

 ============ config =============
batch:
logLevel = [DEFAULT]
skip = 0
totalBatches = [DEFAULT]
recognizer = connectedDigitsRecognizer
usePooledBatchManager = [DEFAULT]
count = 0
inputSource = streamDataSource
whichBatch = [DEFAULT]

connectedDigitsRecognizer:
logLevel = [DEFAULT]
monitors = accuracyTracker, speedTracker, memoryTracker, recognizerMonitor
decoder = digitsDecoder

digitsDecoder:
searchManager = searchManager
logLevel = [DEFAULT]
featureBlockSize = [DEFAULT]

searchManager:
scorer = threadedScorer
activeListFactory = activeList
logLevel = [DEFAULT]
pruner = trivialPruner
logMath = logMath
growSkipInterval = [DEFAULT]
showTokenCount = [DEFAULT]
wantEntryPruning = [DEFAULT]
linguist = flatLinguist
relativeWordBeamWidth = [DEFAULT]

logMath:
logLevel = [DEFAULT]
useAddTable = true
logBase = 1.0001

 

Plotting Component connections
The configuration monitor can also dump out a graphical plot of the components and their connections.  The  plot is in GDL format  which can be plotted with the aiSee - graph visualization program.  Here's a sample of the output:

gdl plot

To generate a component dump, set the showConfigAsGDL property of the configuration monitor to true. This will dump the GDL plot to a file called "config.gdl".

Other Configuration Dump
There are some other configuration dumps in the works, including a configuration dumper that outputs the current configuration in HTML format with hyperlinks to the appropriate JavaDoc component documentation.

Linguist  GDLDumper

The lingust GDL dumper dumps a GDL plot of the search graph.  The search graph is the primary data structure used by the recognizer during the decode process. Note that the graph can become very large even for very small vocabularies.  Here's a configuration for the LinguistDumper:

    <component name="recognizerMonitor" 
type="edu.cmu.sphinx.instrumentation.RecognizerMonitor">
<property name="recognizer" value="isolatedDigitsRecognizer"/>
<propertylist name="allocatedMonitors">
<item>linguistDumper </item>
</propertylist>
</component>


<component name="linguistDumper"
type="edu.cmu.sphinx.linguist.util.GDLDumper">
<property name="linguist" value="flatLinguist"/>
<property name="logMath" value="logMath"/>
</component>


Here's reduced size image of a plot generated by the GDL Dumper for the TI46 word list test:

GDL Dumper output


Linguist Stats Dumper

This useful dumper shows statistics about the search space.   Here's the config:
    <component name="recognizerMonitor" 
type="edu.cmu.sphinx.instrumentation.RecognizerMonitor">
<property name="recognizer" value="${recognizer}"/>
<propertylist name="allocatedMonitors">
<item>linguistStats </item>
</propertylist>
</component

<component name="linguistStats"
type="edu.cmu.sphinx.linguist.util.LinguistStats">
<property name="linguist" value="${linguist}"/>
</component>

The Lingust Stats dumper shows the total number of states in the search space as well as the total number of states of each type.   Here's some sample output:
# ----------- linguist stats ------------ 
# Total states: 256
# class edu.cmu.sphinx.linguist.flat.PronunciationState: 13
# class edu.cmu.sphinx.linguist.flat.NonEmittingHMMState: 46
# class edu.cmu.sphinx.linguist.flat.ExtendedUnitState: 46
# class edu.cmu.sphinx.linguist.flat.BranchState: 12
# class edu.cmu.sphinx.linguist.flat.HMMStateState: 138
# class edu.cmu.sphinx.linguist.flat.GrammarState: 1

Note that for larger tasks, the linguist stats dumper may take a very long time to run since it needs to visit every possible state in the search graph. Even for a relatively small task like the rm1 bigram task, the dumper can take several minutes to work its way through the search graph.

Copyright 1999-2004 Carnegie Mellon University.
Portions Copyright 2002-2004 Sun Microsystems, Inc.
Portions Copyright 2002-2004 Mitsubishi Electric Research Laboratories.
All Rights Reserved. Usage is subject to license terms.