Speech for Java v1.0 |
Speech for Java is a Java programming SDK for speech that gives Java application developers access to the IBM ViaVoice speech technology. The SDK supports voice command recognition, dictation, and text-to-speech synthesis, based on the IBM ViaVoice technology.
The SDK is an implementation most of the Version 1.0 Java Speech API. The Java Speech API is a cross-platform Speech API that was developed by Sun Microsystems Inc. in collaboration with IBM and other industry speech technology companies. More information on the Java Speech API can be found at the Java Speech API home page.
In much the same way that Java implementations are built on top of the native operating system GUI capabilities, Speech for Java is built on top of the native speech recognition and synthesis capabilities in IBM ViaVoice. Thus the SDK requires installation of ViaVoice on your computer; ViaVoice is not provided as part of this package. You may find more information about ViaVoice at the ViaVoice Home Page
Your computer should meet the following minimum requirements for running IBM ViaVoice:
The SDK will take advantage of any enrollment, dictation macros, and added words in your ViaVoice installation.
After unpacking the installation package to its own directory you should have the following files:
README.html This file install.bat Installation script lib/ibmjs.jar The Java portion of the SDK lib/*.dll The native portion of the SDK hello/ The Hello sample application ref/ Java Speech API reference documentation. See index.
Suppose you have unpacked the installation package to a directory dir (e.g. c:\ibmjs). You should do the following:
- modify your CLASSPATH variable to include dir\lib\ibmjs.jar,
- modify your PATH variable to include the dir\lib directory, and
- from dir, execute install.bat to register the IBM engines with the system.
After unpacking the installation package to its own directory you should have the following files:
README.html This file install.sh Installation script lib/ibmjs.jar The Java portion of the SDK lib/*.so The native portion of the SDK hello/ The Hello sample application ref/ Java Speech API reference documentation. See index.
Suppose you have unpacked the installation package to a directory dir (e.g. /usr/ibmjs). You should do the following:
- modify your CLASSPATH environment variable to include dir/lib/ibmjs.jar,
- modify your LD_LIBRARY_PATH environment variable to include dir/lib,
- make sure that the ViaVoice environment variables are set (e.g. by typing ". vvsetenv" or by including this in your login profile), and
- from dir, type "sh install.sh" to register the IBM engines with the system. (You may need to do this step as root because it involves creating or modifying a file in the JDK tree.)
Note: the following description is written assuming that your language is English. The Hello program also now supports French and German. Look at the corresponding hello*.gram and res*.properties files to find the equivalent instructions for other languages.
Note: on Windows, the Hello program requires a full-duplex sound card and driver. Some older sound cards and older drivers don't support simultaneous audio output (synthesis) and audio input (recognition). For some notes on working around this problem, please see the discussion section of the alphaWorks site.
To test your installation, open a command prompt, cd to the hello directory, and run the hello script (on Windows type "hello.bat"; on Linux type "sh hello.sh"). This will start the Hello program. The program will start out by saying "Hello human, my name is Computer, what is your name?" At this point the program is in command mode, and you may say one of the following:
You say Program does "My name is first last" Says "Hello first last" "Goodbye" Says "Bye now" and exits "Repeat after me" Enters dictation mode "That's all" Leaves dictation mode and repeats what you said
Note that since this is just a simple example, the Hello application has a very limited range of first and last names that it understands. You can look at the appropriate hello*.gram file to see which names it knows.
You may add your own name by modifying hello.gram and then restarting the Hello program. To add your own name, follow the pattern that you see in hello.gram. For example, if your name is John Doe, try adding a line that says "| John {John}" to the rule for <first>, and similarly for <last>. You should now be able to say "My name is John Doe" and get an appropriate response.
The part in curly brackets in hello.gram is a tag. Tags are general-purpose mechanism of the Java Speech Grammar Format (JSGF); in this sample it is used to determine how the computer says your name. Try modifying the tag for your name and restarting the hello program, and see what the computer says when you say "My name is ...".
While in command mode, you can say "Repeat after me", at which point the computer will enter dictation mode and say "I'm listening". You may now dictate any text you like and finish by saying "That's all", at which point the program will repeat what you said and go back into command mode.
The source of the Hello program is included to help you understand the basics of writing a speech application using the Speech for Java. A good starting point for learning about the Java Speech API is the Programmer's Guide located on the Sun web site.
The following reference documentation is also provided with this package:
For more information and news about the Java Speech API you may refer to the Java Speech API Home Page .
The SynthesizerProperties interface allows an application to programatically change speaking attributes such as pitch, speech, gender, and age. However, as noted in the API spec the exact timing of synthesizer attribute changes made via the API is not defined; therefore applications that wish to change speaking attributes in real time, synchronized with the words being spoken - for example, to deliver different messages in different voices, pitches, or rates - should use JSML rather than the API to effect such changes.
However, currently no mechanism is defined in JSML for selecting a speaking voice with a given age or gender. Therefore, we have temporarily added an IBMVOICE tag to our implementation of JSML, which takes two attributes:
Sun is aware of the requirement for such a feature in JSML. If a mechanism for selecting the age and gender of a voice is added to a subsequent release of the JSML spec, that mechanism will be implemented and the IBMVOICE tag will be removed from our implementation.
Please also be aware that the SDK is based on the beta version of the JSML specification available at the time of the release of the SDK, and that this beta specification is subject to incompatible changes by Sun Microsystems Inc.
Speech for Java is an implementation of most of the Java Speech 1.0 API. A few methods are not yet implemented, including the following:
Recognizer.readVendorGrammar Recognizer.writeVendorGrammar Recognizer.readVendorResult Recognizer.writeVendorResult |
The following methods are implemented with limited functionality as described below:
FinalResult.releaseTrainingInfo is a placeholder FinalResult.tokenCorrection is a placeholder Recognizer.getVocabManager always returns null Recognizer.getSpeakerManager always returns null RecognizerProperties.setCompleteTimeout always throws PropertyVetoException RecognizerProperties.setIncompleteTimeout always throws PropertyVetoException RecognizerProperties.setConfidenceLevel always throws PropertyVetoException RecognizerProperties.setNumResultAlternatives always throws PropertyVetoException RecognizerProperties.setSensitivity always throws PropertyVetoException RecognizerProperties.setSpeedVsAccuracy always throws PropertyVetoException RecognizerProperties.setTrainingProvided always throws PropertyVetoException |
The following methods were previously available with limited functionality, they have now been implemented to provide complete functionality.
FinalRuleResult.getNumberGuesses FinalRuleResult. getAlternativeTokens(int i) FinalRuleResult.getRuleName(int i) FinalRuleResult.getRuleGrammar(int i) |
Because this implementation is designed to support Java version 1.1, it does not implement the Java Speech security model documented under the SpeechPermission class. Implementation of the security model described in that class is dependent on features of Java version 1.2, and will be implemented in a future version of Speech for Javadesigned to support Java 1.2. Therefeore it is not recommended that this version of the SDK be installed as a Java 1.2 extension because it will expose the user to the security risks documented under the SpeechPermission class.