Speech for Java v1.0

 

Overview

Speech for Java is a Java programming SDK for speech that gives Java application developers access to the IBM ViaVoice speech technology. The SDK supports voice command recognition, dictation, and text-to-speech synthesis, based on the IBM ViaVoice technology.

The SDK is an implementation most of the Version 1.0 Java Speech API. The Java Speech API is a cross-platform Speech API that was developed by Sun Microsystems Inc. in collaboration with IBM and other industry speech technology companies. More information on the Java Speech API can be found at the Java Speech API home page.

Requirements

In much the same way that Java implementations are built on top of the native operating system GUI capabilities, Speech for Java is built on top of the native speech recognition and synthesis capabilities in IBM ViaVoice. Thus the SDK requires installation of ViaVoice on your computer; ViaVoice is not provided as part of this package. You may find more information about ViaVoice at the ViaVoice Home Page

Your computer should meet the following minimum requirements for running IBM ViaVoice:

The SDK will take advantage of any enrollment, dictation macros, and added words in your ViaVoice installation.

Windows Installation

After unpacking the installation package to its own directory you should have the following files:

README.html   This file
install.bat   Installation script
lib/ibmjs.jar   The Java portion of the SDK
lib/*.dll   The native portion of the SDK
hello/   The Hello sample application
ref/   Java Speech API reference documentation. See index.

Suppose you have unpacked the installation package to a directory dir (e.g. c:\ibmjs). You should do the following:

Linux Installation

After unpacking the installation package to its own directory you should have the following files:

README.html   This file
install.sh   Installation script
lib/ibmjs.jar   The Java portion of the SDK
lib/*.so   The native portion of the SDK
hello/   The Hello sample application
ref/   Java Speech API reference documentation. See index.

Suppose you have unpacked the installation package to a directory dir (e.g. /usr/ibmjs). You should do the following:

Using the Hello sample

Note: the following description is written assuming that your language is English. The Hello program also now supports French and German. Look at the corresponding hello*.gram and res*.properties files to find the equivalent instructions for other languages.

Note: on Windows, the Hello program requires a full-duplex sound card and driver. Some older sound cards and older drivers don't support simultaneous audio output (synthesis) and audio input (recognition). For some notes on working around this problem, please see the discussion section of the alphaWorks site.

To test your installation, open a command prompt, cd to the hello directory, and run the hello script (on Windows type "hello.bat"; on Linux type "sh hello.sh"). This will start the Hello program. The program will start out by saying "Hello human, my name is Computer, what is your name?" At this point the program is in command mode, and you may say one of the following:

You say   Program does
"My name is first last"   Says "Hello first last"
"Goodbye"   Says "Bye now" and exits
"Repeat after me"   Enters dictation mode
"That's all"   Leaves dictation mode and repeats what you said

Note that since this is just a simple example, the Hello application has a very limited range of first and last names that it understands. You can look at the appropriate hello*.gram file to see which names it knows.

You may add your own name by modifying hello.gram and then restarting the Hello program. To add your own name, follow the pattern that you see in hello.gram. For example, if your name is John Doe, try adding a line that says "| John {John}" to the rule for <first>, and similarly for <last>. You should now be able to say "My name is John Doe" and get an appropriate response.

The part in curly brackets in hello.gram is a tag. Tags are general-purpose mechanism of the Java Speech Grammar Format (JSGF); in this sample it is used to determine how the computer says your name. Try modifying the tag for your name and restarting the hello program, and see what the computer says when you say "My name is ...".

While in command mode, you can say "Repeat after me", at which point the computer will enter dictation mode and say "I'm listening". You may now dictate any text you like and finish by saying "That's all", at which point the program will repeat what you said and go back into command mode.

For more information

The source of the Hello program is included to help you understand the basics of writing a speech application using the Speech for Java. A good starting point for learning about the Java Speech API is the Programmer's Guide located on the Sun web site.

The following reference documentation is also provided with this package:

For more information and news about the Java Speech API you may refer to the Java Speech API Home Page .

Notes on Speech for Java

The SynthesizerProperties interface allows an application to programatically change speaking attributes such as pitch, speech, gender, and age. However, as noted in the API spec the exact timing of synthesizer attribute changes made via the API is not defined; therefore applications that wish to change speaking attributes in real time, synchronized with the words being spoken - for example, to deliver different messages in different voices, pitches, or rates - should use JSML rather than the API to effect such changes.

However, currently no mechanism is defined in JSML for selecting a speaking voice with a given age or gender. Therefore, we have temporarily added an IBMVOICE tag to our implementation of JSML, which takes two attributes:

Sun is aware of the requirement for such a feature in JSML. If a mechanism for selecting the age and gender of a voice is added to a subsequent release of the JSML spec, that mechanism will be implemented and the IBMVOICE tag will be removed from our implementation.

Please also be aware that the SDK is based on the beta version of the JSML specification available at the time of the release of the SDK, and that this beta specification is subject to incompatible changes by Sun Microsystems Inc.

Speech for Java is an implementation of most of the Java Speech 1.0 API. A few methods are not yet implemented, including the following:

Recognizer.readVendorGrammar
Recognizer.writeVendorGrammar
Recognizer.readVendorResult
Recognizer.writeVendorResult

The following methods are implemented with limited functionality as described below:

FinalResult.releaseTrainingInfo is a placeholder
FinalResult.tokenCorrection is a placeholder
Recognizer.getVocabManager always returns null
Recognizer.getSpeakerManager always returns null
RecognizerProperties.setCompleteTimeout always throws PropertyVetoException
RecognizerProperties.setIncompleteTimeout always throws PropertyVetoException
RecognizerProperties.setConfidenceLevel always throws PropertyVetoException
RecognizerProperties.setNumResultAlternatives always throws PropertyVetoException
RecognizerProperties.setSensitivity always throws PropertyVetoException
RecognizerProperties.setSpeedVsAccuracy always throws PropertyVetoException
RecognizerProperties.setTrainingProvided always throws PropertyVetoException

New in this version

The following methods were previously available with limited functionality, they have now been implemented to provide complete functionality.

FinalRuleResult.getNumberGuesses
FinalRuleResult. getAlternativeTokens(int i)
FinalRuleResult.getRuleName(int i)
FinalRuleResult.getRuleGrammar(int i)

Security issues

Because this implementation is designed to support Java version 1.1, it does not implement the Java Speech security model documented under the SpeechPermission class. Implementation of the security model described in that class is dependent on features of Java version 1.2, and will be implemented in a future version of Speech for Javadesigned to support Java 1.2. Therefeore it is not recommended that this version of the SDK be installed as a Java 1.2 extension because it will expose the user to the security risks documented under the SpeechPermission class.