Instrumentation
for Sphinx-4
|
Introduction
Sphinx-4 can be configured to output various collections of information
that may be useful for users and developers. This information includes:
- Warning and Error messages
- Logging / tracing messages
- Recognition results
- Accuracy statistics
- Speed statistics
- Memory footprint statistics
- Configuration information
- Grammar plots
- Search space plots
The output of the various types of instrumentation information is
controllable from the configuration file. Lets look in
detail at what information is being displayed and how to control
what information is output.
Silence is Golden
First lets look at a Sphinx-4 configuration file for a tidigits
task. (You can learn more about configuration files by reading Sphinx-4
Configuration Management). The Sphinx-4 configuration file silent.config.xml shows a standard
configuration for recognizing connected digits. It is based upon
the tidigits.config.xml found in sphinx4/tests/performance/tidigits,
except that all logging and instrumentation has been disabled. If
we run Sphinx-4 with this configuration we get absolutely no output:
% java edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch
%
That is probably not very useful for most applications. Let's
take a look to see if we can get some recognition results using the Logger.
Using the Logger
Any well behaved Sphinx components (there are some that are not so well
behaved...) that needs to output informational messages will do so via
the Sphinx-4 logger. These have a level of importance associated
with them. Some messages indicate severe
problems, some messages are warnings, some are informational, and some are fine level tracing messages.
The complete set of log levels are:
- SEVERE (highest value) - an error occurs that makes continuing
the operation difficult or impossible
- WARNING - an anomalie has occured, but the operation is
continuing
- INFO - general information
- CONFIG - information about a components configuration
- FINE - tracing messages
- FINER - finer grained tracing messages (lots of output)
- FINEST (lowest value) - finest grained tracing messages (huge
amounts of output)
In silent.config.xml there is a global property
called logLevel that is set
to OFF.
<config>
<property name="logLevel" value="OFF"/>
<!-- components omitted -->
</config>
This indicates that by default, no logging information will be logged
to the console at all. This, of course, is dangerous because we
probably want at least to see all warning and error
messages. Let's turn on warning and error messages. We do this by
setting the logLevel to WARNING like so:
<config>
<property name="logLevel" value="WARNING"/>
<!-- components omitted -->
</config>
By setting the logLevel to WARNING, we are saying that we want to see
all log messages at the WARNING level or higher. With this setting we
should see WARNING and SEVERE messages. (Note that this is the
default setting anyway, so if you omit setting logLevel at the global
level, the logLevel is automatically set to WARNING).
Let's run this again with our new settings:
% java edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch
%
It is still silent, which means we don't have any warning or errors in
our run. Now, lets see what an error looks like. To force an
error, I'll delete one of the audio input files listed in the tidigits.batch file. This should
cause an error when the recognizer attempts to deocode the
missing file . Here's an example:
% java edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch
07:37.604 SEVERE I/O error during decoding: /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.o5o6671a.wav.raw\
(No such file or directory) in edu.cmu.sphinx.tools.batch.BatchModeRecognizer:decode
%
This time we get a SEVERE error report showing when and where the error
occurred. Note that the log includes information such as the timetamp
for the error, the level of the error, a detailed error message and an
indication of where in the code the error occurred.
Now let's restore the missing file so we don't get this error anymore
and try to get some results displayed.
The JavaDocs for the BatchModeRecognizer
indicate that the BatchModeRecognizer will log results at the INFO
level. Let's try setting the logLevel to INFO to see what the
BatchModeRecognizer reports.
<config>
<property name="logLevel" value="INFO"/>
<!-- components omitted -->
</config>
By setting the logLevel to INFO we are enabling logs at the INFO,
WARNING and SEVERE levels.
With this new setting lets run the recognizer again to see what output
we get:
% java edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch
08:23.006 INFO logMath Log base is 1.0001
08:23.020 INFO logMath Using AddTable when adding logs
08:23.021 INFO logMath LogAdd table has 99022 entries.
08:23.683 INFO sphinx3Loader Sphinx3Loader
08:23.684 INFO sphinx3Loader Pool wd_dependent_phone.cd_continuous_8gau/means Entries: 4816
08:23.686 INFO sphinx3Loader Pool wd_dependent_phone.cd_continuous_8gau/variances Entries: 4816
08:23.687 INFO sphinx3Loader Pool wd_dependent_phone.cd_continuous_8gau/transition_matrices Entries: 34
08:23.688 INFO sphinx3Loader Pool senones Entries: 602
08:23.689 INFO sphinx3Loader Pool meanTransformationMatrix Entries: 1
08:23.690 INFO sphinx3Loader Pool meanTransformationMatrix Entries: 1
08:23.691 INFO sphinx3Loader Pool varianceTransformationMatrix Entries: 1
08:23.692 INFO sphinx3Loader Pool varianceTransformationMatrix Entries: 1
08:23.693 INFO sphinx3Loader Pool wd_dependent_phone.cd_continuous_8gau/mixture_weights Entries: 602
08:23.694 INFO sphinx3Loader Pool senones Entries: 602
08:23.696 INFO sphinx3Loader Context Independent Unit Entries: 34
08:23.697 INFO sphinx3Loader HMM Manager: 430 hmms
08:23.698 INFO acousticModel CompositeSenoneSequences: 0
08:23.700 INFO dictionary Loading dictionary from:
08:23.701 INFO dictionary file:/lab/speech/sphinx4/data/tidigits_8gau_13dCep_16k_40mel_130Hz_6800Hz.bin.zip/dictionary
08:23.712 INFO dictionary Loading filler dictionary from:
08:23.714 INFO dictionary file:/lab/speech/sphinx4/data/tidigits_8gau_13dCep_16k_40mel_130Hz_6800Hz.bin.zip/fillerdict
08:23.728 INFO wordListGrammar Num nodes : 14
08:23.729 INFO wordListGrammar Num arcs : 34
08:23.731 INFO wordListGrammar Avg arcs : 2.4285715
08:23.306 INFO threadedScorer # of scoring threads: 1
08:23.393 INFO batch BatchDecoder: decoding files in tidigits.batch
08:23.173 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.111a.wav.raw
08:23.175 INFO batch Result: <sil> one one one
08:23.645 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.139oa.wav.raw
08:23.647 INFO batch Result: <sil> one three nine oh
08:24.957 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.155a.wav.raw
08:24.958 INFO batch Result: <sil> one five five
08:24.278 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1688a.wav.raw
08:24.279 INFO batch Result: <sil> one six eight eight
08:24.987 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1a.wav.raw
08:24.988 INFO batch Result: <sil> one
It looks like there are a number of other components that are issuing
INFO messages that are cluttering up our output. We'd like to be
able to turn the other INFO messages off, and just get the
BatchModeRecognizer INFO messages. We can do this by setting the
logLevel at the individual component level. Each component can
have its own individual logging level. This means that different
components can be logging messages at different levels. Since we
only want the BatchModeRecognizer to be outputing INFO messages, lets
restore the overall logging level to WARNING and set the logLevel for
'batch' (the name of the BatchModeRecognizer component) to INFO.
<config>
<property name="logLevel" value="INFO"/>
<component name="batch"
type="edu.cmu.sphinx.tools.batch.BatchModeRecognizer">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<property name="inputSource" value="streamDataSource"/>
<property name="logLevel" value="INFO"/>
</component>
<!-- many components omitted -->
</config>
Now lets look at out output:
% java -cp ../../../bld/classes/ -Dskip=20 edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch
08:26.591 INFO batch BatchDecoder: decoding files in tidigits.batch
08:26.260 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.111a.wav.raw
08:26.262 INFO batch Result: <sil> one one one
08:26.749 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.139oa.wav.raw
08:26.751 INFO batch Result: <sil> one three nine oh
08:26.105 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.155a.wav.raw
08:26.107 INFO batch Result: <sil> one five five
08:26.390 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1688a.wav.raw
08:26.391 INFO batch Result: <sil> one six eight eight
08:26.022 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1a.wav.raw
08:26.023 INFO batch Result: <sil> one
08:26.029 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1b.wav.raw
08:26.030 INFO batch Result: <sil> one
08:26.048 INFO batch File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1za.wav.raw
08:26.049 INFO batch Result: <sil> one zero
There are ways
to control the terseness of the actual output as well. Setting
the global property logTerse
to true, will result in the ancillary information (timestamp, level,
source component) being omitted.
<config>
<property name="logLevel" value="INFO"/>
<component name="batch"
type="edu.cmu.sphinx.tools.batch.BatchModeRecognizer">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<property name="inputSource" value="streamDataSource"/>
<property name="logLevel" value="INFO"/>
<property name="logTerse" value="true"/>
</component>
</config>
Here's the terse output:
% java -cp ../../../bld/classes/ -Dskip=20 edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch
Handler java.util.logging.ConsoleHandler@cdfc9c
BatchDecoder: decoding files in tidigits.batch
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.111a.wav.raw
Result: <sil> one one one
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.139oa.wav.raw
Result: <sil> one three nine oh
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.155a.wav.raw
Result: <sil> one five five
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1688a.wav.raw
Result: <sil> one six eight eight
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1a.wav.raw
Result: <sil> one
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1b.wav.raw
Result: <sil> one
File : /lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1za.wav.raw
Result: <sil> one zero
At this point we know enough about the logger to be able to turn it on
and off, to control the level of logging output on a per component
basis and to configure the appearance of the logging output.
Tracking Accuracy
Now lets look at how we can track the accuracy performance of
Sphinx-4. One of the prime methods of measuring the overall
quality of a speech recognition system is the recognition accuracy. This
statistic shows how well the sentence hypotheses produced by the
recognizer match the actual transcripts of what was spoken.
Obviously, recognition accuracy can only be reported when the
transcripts are available as well. All of the Sphinx-4
performance tests (found under the Sphinx-4/tests/performance
directory) include transcripts. For instance, the batch file tidigits.batch begins like so:
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.111a.wav.raw one one one
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.139oa.wav.raw one three nine oh
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.155a.wav.raw one five five
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1688a.wav.raw one six eight eight
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1a.wav.raw one
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1b.wav.raw one
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.1za.wav.raw one zero
/lab/speech/sphinx4/data/tidigits/test/raw16k/man/man.ah.24z982za.wav.raw two four
Each line represents a single utterance. The first entry on each line
contains the path name to the audio that is to be recognized. The
remaining entries are the words that make up the transcript for the
utterance. Using this information the BatchModeRecognizer can
make available the transcripts necessary for producing accuracy
statistics.
The accuracy tracker is a component that is typically added to the set
of monitors for a recognizer. The accuracy tracker will monitor
the recognizer, and when the recognizer generates a result, the tracker
will compare the resulting hypothesis to the appropriate transcript and
generate the statistics.
Let's configure our system now to include an accuracy tracker.
First we add an entry for the component itself:
<component name="accuracyTracker"
type="edu.cmu.sphinx.instrumentation.AccuracyTracker">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<property name="showAlignedResults" value="false"/>
<property name="showRawResults" value="false"/>
</component>
Next, we add the accuracy tracker to the set of recognizer monitors
like so:
<component name="connectedDigitsRecognizer"
type="edu.cmu.sphinx.recognizer.Recognizer">
<property name="decoder" value="digitsDecoder"/>
<propertylist name="monitors">
<item>accuracyTracker </item>
</propertylist>
</component>
Also, since the accuracy tracker will ouput results, we can turn off
the output of the results by the 'batch' component by reseting the logLevel setting to WARNING.
Here's the output:
% java -cp ../../../bld/classes/ -Dskip=20 edu.cmu.sphinx.tools.batch.BatchModeRecognizer silent.config.xml tidigits.batch
(... many lines omitted)
REF: four one six
HYP: four one six
ALIGN_REF: four one six
ALIGN_HYP: four one six
RAW <sil> four one six
Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
Words: 78 Matches: 78 WER: 0.000%
Sentences: 23 Matches: 23 SentenceAcc: 100.000%
REF: four two eight oh oh oh nine
HYP: four two eight oh oh nine
ALIGN_REF: four two eight oh oh OH nine
ALIGN_HYP: four two eight oh oh ** nine
RAW <sil> four two eight oh oh nine
Accuracy: 98.824% Errors: 1 (Sub: 0 Ins: 0 Del: 1)
Words: 85 Matches: 84 WER: 1.176%
Sentences: 24 Matches: 23 SentenceAcc: 95.833%
REF: four five two zero three
HYP: four five two zero three
ALIGN_REF: four five two zero three
ALIGN_HYP: four five two zero three
RAW <sil> four five two zero three
As you can see the accuracy tracker outputs quite a bit of
information. Lets look at the information in detail:
REF
|
Reference
- This is the reference or transcript. This is what should be recognized.
|
HYP
|
Hypothesis
- This is the result that is generated by the recognizer. This is what was recognized.
|
ALIGN_REF
|
Aligned
Reference - This is the reference text, where mismatches between
the reference and the hypothesis are highlighted.
|
ALIGH_HYP
|
Aligned
Hypothesis - This is the recognized text with mismatched
text highlighted.
|
RAW
|
Raw
Text - this is the actual text recognized, including all filler
words such as silences, coughs, lip smacks, breaths and so on.
|
Accuracy
|
Word
Accuracy - The number of matching words compared to the total
number of words in the input as a percent.
|
Errors:
|
Word
Error Count - The total number of word errors.
|
Sub
|
Substition
count - The total number of substitution errors. A substitution
error occurs when one word is replaced by another.
|
Ins
|
Insertion
count - The total number of insertion errors. An insertion
error occurs when an extra word is inserted in the hypothesis.
|
Del
|
Deletion
count - The total number of deletion errors. A deletion error
occurs when a word is missing in the hypothesis.
|
Words
|
Reference
word count - The total number of words expected
|
Matches
|
Matching
word count - The total number of matching words
|
WER
|
Word
error rate - This is equal to (sub + ins + del) / words * 100
|
Sentences
|
Reference
sentence count - The total number of sentences.
|
Matches
|
Matching
sentences - The total number of matching sentences
|
SentenceAcc
|
Sentence
Accuracy - This is equal to (matches / sentences) * 100
|
First it shows the REF and HYP outputs. REF is the reference or
transcript. This is the expected result. HYP is the hypothesis, the
result that was generated by the recognizer.
That's a whole lot of stuff, in fact, it is probably more than we need.
We can configure the accuracy tracker (of course) to reduce the
amount of output. Let's turn off the ALIGN and the RAW outputs:
<component name="accuracyTracker"
type="edu.cmu.sphinx.instrumentation.AccuracyTracker">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<property name="showAlignedResults" value="false"/>
<property name="showRawResults" value="false"/>
</component>
The accuracy tracker will also show summary information at the end of a
run (when the recognizer is deallocated). Here's an example
showing the reduced out and the summary information.
REF: one
HYP: one
Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
Words: 15 Matches: 15 WER: 0.000%
Sentences: 5 Matches: 5 SentenceAcc: 100.000%
# --------------- Summary statistics ---------
Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
Words: 15 Matches: 15 WER: 0.000%
Sentences: 5 Matches: 5 SentenceAcc: 100.000%
The summary statistics shows the total accuracy data for the entire run.
Tracking Speed
Another important aspect of speech recognition is the speed of
recognition. The speed tracker will track and report statistics
relating to the speed of recognition. The speed tracker is added
to the set of monitors in the recognizer in the same way that the
accuracy tracker is added:
<component name="connectedDigitsRecognizer"
type="edu.cmu.sphinx.recognizer.Recognizer">
<property name="decoder" value="digitsDecoder"/>
<propertylist name="monitors">
<item>accuracyTracker </item>
<item>speedTracker </item>
</propertylist>
</component>
<component name="speedTracker"
type="edu.cmu.sphinx.instrumentation.SpeedTracker">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<property name="frontend" value="${frontend}"/>
</component>
Here's some output of the speed tracker:
REF: one one one
HYP: one one one
Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
Words: 3 Matches: 3 WER: 0.000%
Sentences: 1 Matches: 1 SentenceAcc: 100.000%
This Time Audio: 1.38s Proc: 2.16s Speed: 1.56 X real time
Total Time Audio: 1.38s Proc: 2.16s Speed: 1.56 X real time
REF: one three nine oh
HYP: one three nine oh
Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
Words: 7 Matches: 7 WER: 0.000%
Sentences: 2 Matches: 2 SentenceAcc: 100.000%
This Time Audio: 1.47s Proc: 0.97s Speed: 0.66 X real time
Total Time Audio: 2.85s Proc: 3.13s Speed: 1.10 X real time
The data output by the speed tracker are:
This time audio
|
The length of time (in
seconds) of the current audio.
|
This time proc
|
The time spent processing this
audio.
|
This Speed:
|
processing time / audio time
|
Total time audio
|
The time for all audio
|
Total processing |
The time spent processing all
audio |
Total Speed:
|
total proc time / total audio
time
|
Dumping Response Time
The speed tracker can also be configured to show response time. This is
useful when running in a live-mode situation where front-end buffering
latency can affect the perceived performance of the system.
The speed tracker configuration for enabling tracking of response time
is shown here:
<component name="speedTracker"
type="edu.cmu.sphinx.instrumentation.SpeedTracker">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<property name="frontend" value="${frontend}"/>
<property name="showResponseTime" value="true"/>
</component>
The response time output looks like this:
HYP: one three nine oh
Sentences: 2
This Time Audio: 1.15s Proc: 0.86s Speed: 0.75 X real time
Total Time Audio: 3.33s Proc: 3.52s Speed: 1.06 X real time
Response Time: Avg: 0.032333333s Max: 0.085s Min: 0.0060s
The response time field includes and average (Avg), maximum(Max) and
minimum (Min) response time encountered. This is the time from when the
front-end first encounters a packet of audio, until it is delivered to
the decoding portion of the recognizer. This gives a good measure
of the latency due to the front-end processing such as normalization
and end-pointing.
Dumping Timing Statistics
The speed tracker can also be configured to dump out low level timing
data for various aspects of the recognition process. Many of the
components in the Sphinx-4 system will collect detailed timing
statistics. For instance, the linguist may keep track of how long
it takes to build the search graph, and the acoustic model loader may
keep track of how long it takes to load the acoustic model from a
compressed file.
Setting the speedTracker showTimers property
to true will cause the timing information to be dump. The timing
information is dumped immediately after the system is initialized, and
again when the recognizer is deallocated. Here's a sample of the
timing output:
# ----------------------------- Timers----------------------------------------
# Name Count CurTime MinTime MaxTime AvgTime TotTime
streamDataSourc 196 0.0000s 0.0000s 0.0390s 0.0004s 0.0760s
premphasizer 196 0.0000s 0.0000s 0.0130s 0.0001s 0.0190s
windower 194 0.0010s 0.0000s 0.0550s 0.0009s 0.1840s
fft 1732 0.0000s 0.0000s 0.0530s 0.0003s 0.4780s
melFilterBank 1732 0.0000s 0.0000s 0.0410s 0.0000s 0.0790s
dct 1732 0.0000s 0.0000s 0.0280s 0.0001s 0.0920s
featureExtracti 1692 0.0000s 0.0000s 0.1610s 0.0001s 0.1980s
AM_Load 1 2.3060s 2.3060s 2.3060s 2.3060s 2.3060s
DictionaryLoad 1 0.0110s 0.0110s 0.0110s 0.0110s 0.0110s
compile 1 0.8750s 0.8750s 0.8750s 0.8750s 0.8750s
createGStates 1 0.0260s 0.0260s 0.0260s 0.0260s 0.0260s
collectContex 1 0.0050s 0.0050s 0.0050s 0.0050s 0.0050s
expandStates 1 0.7250s 0.7250s 0.7250s 0.7250s 0.7250s
connectNodes 1 0.0140s 0.0140s 0.0140s 0.0140s 0.0140s
scoring 1722 0.0000s 0.0000s 0.3980s 0.0037s 6.4570s
pruning 1712 0.0000s 0.0000s 0.0010s 0.0000s 0.0100s
growing 1722 0.0030s 0.0000s 0.0610s 0.0027s 4.666
This table shows the timing information after a short run of tidigits
word list. Here's the data key:
Name
|
The name of the operation
|
Count
|
The number of times the
operation was invoked
|
CurTime
|
The most recent timing for this
operation
|
MinTime
|
The fastest time for this
operation
|
MaxTime
|
The slowest time for this
operation
|
AvgTime
|
The average time for this
operation
|
TotTime
|
The total time for this operation
|
Tracking Memory Usage
For some applications, the overall memory footprint of the recognizer
is important. The MemoryTracker can be used to track the memory
usage of Sphinx-4. The MemoryTracker is added
to the set of monitors in the recognizer in the same way that the
accuracy tracker is added:
<component name="connectedDigitsRecognizer"
type="edu.cmu.sphinx.recognizer.Recognizer">
<property name="decoder" value="digitsDecoder"/>
<propertylist name="monitors">
<item>accuracyTracker </item>
<item>speedTracker </item>
<item>memoryTracker </item>
</propertylist>
</component>
<component name="memoryTracker"
type="edu.cmu.sphinx.instrumentation.MemoryTracker">
<property name="recognizer" value="connectedDigitsRecognizer"/>
</component>
The output of the memory tracker is as follows:
REF: one
HYP: one
Accuracy: 100.000% Errors: 0 (Sub: 0 Ins: 0 Del: 0)
Words: 16 Matches: 16 WER: 0.000%
Sentences: 6 Matches: 6 SentenceAcc: 100.000%
This Time Audio: 0.99s Proc: 0.64s Speed: 0.65 X real time
Total Time Audio: 7.47s Proc: 6.13s Speed: 0.82 X real time
Mem Total: 126.62 Mb Free: 112.26 Mb
Used: This: 14.36 Mb Avg: 14.35 Mb Max: 18.82 Mb
The memory tracker ouputs five data items:
Mem total
|
The total amount of memory
allocated to the VM
|
Free
|
Of the Mem Total how much is currently not
being used
|
Used This
|
How much memory is currently
being used
|
Used Avg
|
The average amount of memory used
|
Used Max |
The maximum amount of memory used
|
Miscellaneous Instrumentation
In addition to the previously described instrumentation, there are a
few other monitors that are useful.
Configuration Monitor
The configuration monitor dumps out the current configuration of the
system. This dump differs from the configuration file in a few
ways:
- The format is more readable by humans than the XML format
- The configuration dump shows just the active configuration,
whereas the XML configuration file may have configuration data for
components that are not actually used.
- The configuration dump shows the configuration data after any
properties have been set via system properties (as is often done in the
build.xml file).
- The configuration dump shows all the possible properties for a
particular component and highlights the values that are receiving their
default values.
The Configuration Monitor (as well as most of the 'dump something
interesting' monitors) are generally controlled by the
RecognizerMonitor. The Configuration Monitor defines what
is to be dumped (in this case the configuration), while the
RecognizerMonitor indicates when it should be dumped. Let's
configure a recognizer to dump the configuration after the recognizer
is allocated (that is, the recognizer is completely initialized and
ready to recognize).
<!-- add the recognizer monitor to the recognizer -->
<component name="connectedDigitsRecognizer"
type="edu.cmu.sphinx.recognizer.Recognizer">
<property name="decoder" value="digitsDecoder"/>
<propertylist name="monitors">
<item>accuracyTracker </item>
<item>speedTracker </item>
<item>memoryTracker </item>
<item>recognizerMonitor </item>
</propertylist>
</component>
<!-- create the recognizer monitor with the configMonitor as one of the dumpers -->
<component name="recognizerMonitor"
type="edu.cmu.sphinx.instrumentation.RecognizerMonitor">
<property name="recognizer" value="connectedDigitsRecognizer"/>
<propertylist name="allocatedMonitors">
<item>configMonitor </item>
</propertylist>
</component>
<!-- create the configMonitor -->
<component name="configMonitor"
type="edu.cmu.sphinx.instrumentation.ConfigMonitor">
<property name="showConfig" value="true"/>
</component>
Here's a snippet of the output:
============ config =============
batch:
logLevel = [DEFAULT]
skip = 0
totalBatches = [DEFAULT]
recognizer = connectedDigitsRecognizer
usePooledBatchManager = [DEFAULT]
count = 0
inputSource = streamDataSource
whichBatch = [DEFAULT]
connectedDigitsRecognizer:
logLevel = [DEFAULT]
monitors = accuracyTracker, speedTracker, memoryTracker, recognizerMonitor
decoder = digitsDecoder
digitsDecoder:
searchManager = searchManager
logLevel = [DEFAULT]
featureBlockSize = [DEFAULT]
searchManager:
scorer = threadedScorer
activeListFactory = activeList
logLevel = [DEFAULT]
pruner = trivialPruner
logMath = logMath
growSkipInterval = [DEFAULT]
showTokenCount = [DEFAULT]
wantEntryPruning = [DEFAULT]
linguist = flatLinguist
relativeWordBeamWidth = [DEFAULT]
logMath:
logLevel = [DEFAULT]
useAddTable = true
logBase = 1.0001
Plotting Component connections
The configuration monitor can also dump out a graphical plot of the
components and their connections. The plot is in GDL format which can be
plotted with the aiSee - graph
visualization program. Here's a sample of the output:

To generate a component dump, set the showConfigAsGDL
property of the configuration monitor to true. This will dump the GDL
plot to a file called "config.gdl".
Other Configuration Dump
There are some other configuration dumps in the works, including a
configuration dumper that outputs the current configuration in HTML
format with hyperlinks to the appropriate JavaDoc component
documentation.
Linguist GDLDumper
The lingust GDL dumper dumps a GDL plot of the search graph. The
search graph is the primary data structure used by the recognizer
during the decode process. Note that the graph can become very large
even for very small vocabularies. Here's a configuration for the
LinguistDumper:
<component name="recognizerMonitor"
type="edu.cmu.sphinx.instrumentation.RecognizerMonitor">
<property name="recognizer" value="isolatedDigitsRecognizer"/>
<propertylist name="allocatedMonitors">
<item>linguistDumper </item>
</propertylist>
</component>
<component name="linguistDumper"
type="edu.cmu.sphinx.linguist.util.GDLDumper">
<property name="linguist" value="flatLinguist"/>
<property name="logMath" value="logMath"/>
</component>
Here's reduced size image of a plot generated by the GDL Dumper for the
TI46 word list test:

Linguist Stats Dumper
This useful dumper shows statistics about the search space.
Here's the config:
<component name="recognizerMonitor"
type="edu.cmu.sphinx.instrumentation.RecognizerMonitor">
<property name="recognizer" value="${recognizer}"/>
<propertylist name="allocatedMonitors">
<item>linguistStats </item>
</propertylist>
</component
<component name="linguistStats"
type="edu.cmu.sphinx.linguist.util.LinguistStats">
<property name="linguist" value="${linguist}"/>
</component>
The Lingust Stats dumper shows the total number of states in the search
space as well as the total number of states of each type.
Here's some sample output:
# ----------- linguist stats ------------
# Total states: 256
# class edu.cmu.sphinx.linguist.flat.PronunciationState: 13
# class edu.cmu.sphinx.linguist.flat.NonEmittingHMMState: 46
# class edu.cmu.sphinx.linguist.flat.ExtendedUnitState: 46
# class edu.cmu.sphinx.linguist.flat.BranchState: 12
# class edu.cmu.sphinx.linguist.flat.HMMStateState: 138
# class edu.cmu.sphinx.linguist.flat.GrammarState: 1
Note that for larger tasks, the linguist stats dumper may take a very
long time to run since it needs to visit every possible state in the
search graph. Even for a relatively small task like the rm1 bigram
task, the dumper can take several minutes to work its way through the
search graph.
Copyright 1999-2004 Carnegie Mellon
University.
Portions Copyright 2002-2004 Sun Microsystems, Inc.
Portions Copyright 2002-2004 Mitsubishi Electric Research
Laboratories.
All Rights Reserved. Usage is subject to license
terms.