Internationalizing with JBuilder

This chapter examines issues involved in designing your Java applications to meet the needs of a worldwide audience. Why limit the use of your applet or application only to users in a particular country, when with a little extra effort it could be used by people all around the world? Special features in JBuilder make it easy to take advantage of Java's internationalization capabilities, allowing your applications to be customized for any number of countries or languages without requiring cumbersome changes to the code.

Although this chapter is about specific JBuilder features and is not meant to be an indepth discussion of Java's internationalization features, several links are provided to related Java documentation which should help get you started. Finally, before proceeding to the explanation of internationalization features in JBuilder, please review the following section on commonly-used terms that are specific to internationalization.

Internationalization terms and definitions

Internationalization (il8n)
Internationalization is the process of designing or converting an existing program so it is capable of being used in more than one locale. Because of its length, it is often abbreviated as 'i18n', where 18 represents the number of letters between the 'i' and 'n' in the word "internationalization."
Locale
In general, a locale defines a set of culturally-specific conventions for the display, format, and collation (sorting) of data. In Java, a locale is specified by a Locale object, which is simply a container for strings identifying a particular language and country.
Resourcing
Resourcing is the part of the internationalization process which involves isolating the locale-specific resources in the source code into modules, such that they can be independently added to or removed from the application. Examples of locale-specific resources include text displayed to the user, or possibly even business rules or application logic. Java provides a set of ResourceBundle classes for resourcing strings and objects in Java programs.
Localization (l10n)
Localization is the customization of a program's resources for a particular locale. Note that whereas internationalization generalizes a program for any locale, localization specializes it for a single locale. Because of its length, it is often abbreviated as 'l10n', where 10 represents the number of letters between the 'l' and 'n' in the word "localization."
Native encoding
A native encoding, also commonly known as a character set or codepage, defines a mapping of numeric values to symbolic characters within a particular operating system. Because the native encoding varies by operating system (and sometimes even within the same operating system), a file containing characters on one system may appear to have completely different characters on another system using a different native encoding.
Unicode
Unicode is a universal character encoding standard maintained by The Unicode Consortium (http://www.unicode.org) which defines a character mapping for nearly all the written languages of the world. Any Unicode character can be specified in Java source code by its Unicode escape sequence, \uNNNN, where NNNN is the hexadecimal value of the character in the Unicode character set. Characters and strings are always processed as 16-bit Unicode-encoded values within the Java Virtual Machine.

Internationalization features in JBuilder

JBuilder includes a number of features designed to help you easily internationalize your Java applets and applications. The following features are discussed in this section:

Multilingual Sample Application

JBuilder includes an extensive multilingual sample order entry application demonstrating many of the important internationalization concepts in detail. This sample also illustrates many other important features of JBuilder such as building components with JBCL, creating internationalized JavaBeans, and using the DataExpress architecture. You can find the "IntlDemo.jpr" project located under the /samples/borland/samples/intl directory of your JBuilder installation. Please refer to the IntlDemo.html documentation file and source code for more detailed information. The IntlDemo sample supports and includes translations for 12 different locales.

The Borland Multilingual International Store's LocaleChooser JavaBean lets you switch the application's locale at runtime. Doing so automatically adapts the GUI to the language and conventions for the selected locale.

The ProductFrame lets users see images of Borland Store products and written descriptions in their own language. Note how the buttons and labels adjust their sizes automatically for the different Japanese and German translations shown here.

The OrderFrame displays the address of the customer and the cost of the order in the appropriate format for the user's locale. The OrderFrame is shown in French here:

Eliminating hard-coded strings using the Resource wizard

A common design error that prevents your application or applet from easily being localized is the inclusion of hard-coded strings in your source code that are displayed in the GUI of your application or applet.

While you can resource hard-coded strings in your user interface after you've completed and tested your source code, it's better to resource visible strings as part of the GUI design process.

Resourcing your GUI as you write it provides two major advantages:

JBuilder provides two ways to get these benefits with minimal effort: the Resource Wizard and the Localizable Property Setting dialog.

The Resource wizard scans your source code and allows you to quickly and easily move hard-coded strings into Java ResourceBundle classes. This wizard works with any Java file, not just source code generated by JBuilder.

The Localizable Property Setting dialog allows you to resource visible strings as you create or customize components in your GUI. Simply right click on any text property, such as the label of a ButtonControl, and select the ResourceBundle option to display the Localizable Property Setting dialog. This dialog displays options similar to those in the Resource Wizard but includes only those options that affect only the single (selected) property. The dialog is initialized with intelligent defaults so that in most cases, you do not have to make further customizations. Since resourcing from the Component Inspector is so quick and convenient, you can easily make it an integral part of customizing the components in your application.

JBCL internationalization features

The JBCL architecture includes several design decisions which facilitate internationalization of an application or applet:

Using locale-sensitive JBCL components

In addition to being fully resourced, many JBCL components also provide useful locale-sensitive behavior. For example, string data that is loaded into a Column of a GridControl using DataExpress DataSet components is automatically sorted according to the default collation order for the user's runtime locale. Similarly, date, time, and numeric values are automatically formatted correctly for the user's locale.

The following images show two views of the same DataSet column data, as seen by users in the English (United States) and German (Germany) locales.

By default, objects inherit the locale of their containers. Therefore the locale setting on a DataSet will be used by default by Columns within the DataSet. Alternatively, a locale can be specified explicitly for each Column object within the DataSet. This might be useful if, for example, each Column held data which needed to be sorted by a different locale. Refer to the JDK's API documentation about the Collator class for more information about locale-sensitive sorting. For more information about the locale-sensitive formatting of data types in Java, refer to the DateFormat, NumberFormat, and MessageFormat classes in the JDK API documentation.

JBCL components can display any Unicode character

Component architectures which rely solely upon native UI peer controls to display characters can only display the set of characters supported by the native peer. Because the JBCL uses Java to display characters rather than native peers, it is able to display any Unicode character for which a font has been installed on your system, regardless of whether or not that character actually exists in your operating system's default character set. To do this,

  1. Install the desired font on your operating system.
  2. Modify the JDK font.properties file for your locale, specifying that the font for that character is now available.
For instructions on how to do this, refer to Adding Fonts to the Java Runtime in the JDK Internationalization documentation.

Internationalization features in the UI Designer

JBuilder's UI Designer is a powerful tool for the creation and verification of your internationalized GUI design. As you add translatable text elements to your GUI, you can instantly put them into resource bundles. The Inspector automatically reads strings from and writes them back to resource bundles for you. In addition, after you've resourced all the text of your GUI and have received a localized resource bundle from your translator, you can use the designer to quickly build and verify your internationalized user interface.

The Inspector displays locale-sensitive short description information about a JavaBean's property, as described in the internationalization section of the JavaBeans specification.

The Inspector allows the use of Unicode character escape sequences to denote characters that cannot be entered directly via the keyboard under your operating system locale. When you want to insert a Unicode character into a string property you're editing, simply put the hexadecimal value of the character's Unicode escape sequence within angle brackets. For example, to insert the Japanese character for the word "mountain" into the label of a button, enter "<5C71>". If your system has Japanese fonts installed and the proper settings in your JDK font.properties file, the character will be displayed as the label of the button, and the Unicode escape "\u5C71" will appear in your source code.

The UI Designer provides excellent support for dynamic layout managers, a crucial requirement for building internationalized GUI designs. Building a single GUI capable of supporting multiple languages is a difficult task, but one that is made much easier by the UI Designer's support for Java's dynamic AWT layout managers. When designing a GUI intended to be localized for more than one language, an extremely important rule is always use a dynamic layout manager. Consider, for example, the following Dialog containing OK, Cancel, and Help buttons built using an XYLayout.

This displays as expected for English labels, but when the labels are translated into German, the text of the labels is too long to fit completely within the fixed button size. This is a very common problem that almost always occurs when attempting to localize a non-internationalized GUI.

The solution is to use one or more dynamic AWT layout managers to allow the buttons to grow based on their label width. Here are the English and German internationalized versions of the same Dialog, written using a panel with a dynamic GridLayout for the buttons, embedded within a BorderLayout Dialog.

To learn more on creating dynamic layouts using the UI Designer, refer to the Using layout managers section of Building Applications with JBuilder. The multilingual international sample application also demonstrates some advanced techiques for updating the layout of Frames in an application at runtime.

Unicode in the IDE Debugger

The JBuilder debugger allows you to view and edit Unicode characters, even if your operating system does not support them. When examining values in the debugger's watch pane, expand the value in the tree you want to inspect until you can see its primitive Java type. By default, the debugger tries to display the Unicode character, assuming that your operating system can display it.

To view the character's Unicode equivalent, right click the value and select the Show Hex Value... option to see the character's Unicode escape sequence. You can also change the value by selecting Modify Value and entering another Unicode escape sequence in the Change Data Value dialog box.

Specifying a native encoding for the compiler

The JBuilder and javac compilers compile source code encoded in native encodings (also known as local codepages), which is the storage format used by most text editors, including the JBuilder editor.

The IDE and compiler support all JDK native encodings. All JBuilder compilers automatically select the appropriate native encoding for your operating system's locale. You can also specify any JDK encoding for compiling source code files which were written in a different native encoding.

You can specify an encoding name to control how the compiler interprets characters beyond the English (ASCII) character set. The specification can be done on a project-wide basis, or with the encoding compiler switch from the commnand line. If no setting is specified for this option, the default native encoding converter for the platform is used.

Under Western European versions of Windows, including the United States version, the javac and JBuilder compilers assume the encoding of 8859_1, even though the actual encoding should be Cp1252. Cp1252 contains some characters that are not in 8859_1. If your source file contains these additional characters, they will not be correctly interpreted. In this case, you should specify Cp1252 as the encoding.

To set the encoding option from within the IDE, Choose File|Project Properties to display the Project Properties dialog box. On the Compiler page, select an encoding name from the Encoding drop-down list. At the command line, use either bcj or bmj's -encoding option followed by the encoding name to specify an encoding.

Native encodings supported


8859_1
8859_2
8859_3
8859_4
8859_5
8859_6
8859_7
8859_8
8859_9
Big5
CNS11643
Cp1250
Cp1251
Cp1252
Cp1253
Cp1254
Cp1255
Cp1256
Cp1257
Cp1258
Cp437
Cp737
Cp775
Cp850
Cp852
Cp855
Cp857
Cp860
Cp861
Cp862
Cp863
Cp864
Cp865
Cp866
Cp869
Cp874
EUCJIS
GB2312
JIS
KSC5601
MacArabic
MacCentralEurope
MacCroatian
MacCyrillic
MacDingbat
MacGreek
MacHebrew
MacIceland
MacRoman
MacRomania
MacSymbol
MacThai
MacTurkish
MacUkraine
SJIS
Unicode
UnicodeBig
UnicodeLittle
UTF8

Two encoding names have special meaning:

null
Specifies that no native-encoding conversion should be done. Each byte in the file is converted to Unicode by setting it to the lower byte of the Unicode character. The upper byte of the Unicode character is set to zero.
default
Equivalent to not specifying an encoding option. This uses the default encoding of the user's environment.

For a description of each encoding, see the JDK Internationalization Specification: Character Set Conversion: Supported Encodings (http://www.javasoft.com:80/products/jdk/1.1/intl/html/intlspec.doc7.html). The following descriptions supplement that section:

Unicode
Unicode, with big or little endian indicated by Byte-Order-Mark.
UnicodeBig
Big-Endian Unicode.
UnicodeLittle
Little-Endian Unicode.

More about native encodings

Non-Unicode environments represent characters using different encoding systems. In the PC world, these are known as codepages; Java refers to them as native encodings. When moving data from one encoding system to another, conversion needs to be done. Since each system can have a different set of extended characters, conversion is required, to prevent loss of data.

Most text editors, including JBuilder's editor, write text in the native encoding. For example, Japanese Windows uses the Shift-JIS format, and US Windows uses Windows Codepage 1252. Starting with JDK 1.1, javac is also able to compile "native-encoded" source code. The encoding can be specified by using the "encoding" switch. When the encoding is not specified, the compiler uses the encoding based on the user's environment.

Unlike Unicode, source code written with native encoding is not directly portable to systems using other encodings. For example, if source code has been encoded in Shift-JIS (a Japanese encoding), and you are running the compiler in a US Windows environment, you must specify the Shift-JIS encoding for the compiler to read the source correctly.

The 16-bit Unicode format

Unicode is a universal system of representing characters using 16-bit numbers. The 16-bit Unicode character set can be supported directly, or can be represented indirectly within the 7-bit ASCII character set, using the \u escape character followed by four hexadecimal digits.

When all major operating environments directly support Unicode, this will replace the established approach, which requires conversion between different native encodings with conflicting character values. Java is one the first environments to standardize on Unicode; Unicode is the internal character set of the Java environment.

Unicode support using ASCII and '\u'

Currently, most Windows text editors, including JBuilder's editor, store and process text as 7- or 8-bit characters, rather than 16-bit Unicode characters. The ASCII character set uses a 7-bit encoding that contains the 26 letters of the English alphabet and some symbols. Almost all native encodings have ASCII as a subset, and represent it in the same way: the first 127 characters of an encoding are the ASCII character set. The ASCII character set can be considered a subset of Unicode.

To enable users to specify Unicode characters in their source code without a Unicode-enabled editor, the Java specification allows the use of the \u "Unicode escape" in an ASCII file. This usage enables extended characters to be represented by a combination of ASCII characters. This way of representing Unicode uses 6 characters to represent each non-ASCII character. To enter an ordinary ASCII character, you press the character's key on the keyboard, and to enter a non-ASCII character, you type in the Unicode escape sequence representing the character.

In this 7-bit representation of Unicode, each character beyond the ASCII character set is represented in the form \uNNNN, where NNNN are the 4 hex digits of the Unicode character. For example, the Unicode character "Latin Small Letter F with Hook", a cursive 'f' which is represented in Unicode with the hexadecimal number 0192, can be entered by typing "\u0192".

Unicode, in both the 16-bit and 7-bit forms, is in a universal format; source code in Unicode is directly portable to all platforms, in all languages.

JBuilder Around the World

JBuilder is available in several languages including English, German, French and Japanese. Localized versions include translated printed and online documentation, UI and JBCL library. Localized versions of JBuilder are available for purchase from the Borland sales office in those countries.

Online internationalization support

Visit the multi-lingual-apps newsgroup on the Borland Web page at news://forums.borland.com/borland.public.jbuilder.multi-lingual-apps. This newsgroup is dedicated to JBuilder internationalization and multilingual issues and is actively monitored by our support engineers as well as R&D and QA engineers in the JBuilder internationalization group.