Win4Lin User's Guide

Appendix B -- National language support

(User's Guide Table of Contents) (Appendix Table of Contents)
Previous - Appendix A - Using remote X terminals     Next - Appendix C - Filename mapping

The first part of this appendix is an overview of Win4Lin National Language Support (NLS). The second part has information for configuring the NLS features of Win4Lin.

When Win4Lin is installed, it attempts to configure itself for the same language (or locale) and keyboard as your Linux system. If this automatic configuration is not correct, refer to ``Setting up the Win4Lin NLS environment''.

Win4Lin national language support

Win4Lin NLS includes the following features:

This appendix assumes you have some familiarity with DOS code pages. Refer to DOS documentation for more complete information on NLS-related DOS commands such as CHCP, COUNTRY, KEYB, MODE, and NLSFUNC.


NOTE: The standard setup for U.S. English systems is for NLS features to be disabled. See ``Setting up the Win4Lin NLS environment'' if you want to enable these features.

Code pages and code sets

Both DOS and the Linux operating system use tables that determine how the numeric codes stored in the computer are converted into displayed letters, numbers, symbols, and other characters. The DOS tables are called code pages, and the Linux tables are called code sets.

DOS uses several different code pages to fill the needs of most Western European languages.

The Linux operating system, on the other hand, uses only one code set, ISO 8859-1, to handle most Western European languages. Win4Lin supports that code set. Other code set support could be added in the future.

If you use only DOS or only the Linux system, you do not need to be aware of any differences between DOS code pages and Linux code sets. When you use Win4Lin to combine DOS and the Linux system, however, you may notice some differences between the DOS and Linux environments. Some characters in data and file names created with one operating system may be displayed as different characters when you view them with the other operating system. For example, suppose you have a DOS file called memo containing the following text:

   This memo describes the features of our æNET product.
If you issue the command cat memo from the Linux shell, the words might be displayed like this:
   This memo describes the features of our *NET product.
Similar character conversions can occur when you view DOS text on different types of terminals. These character transformations occur because a code number represents each character. The same code number may exist in both your current DOS code page and Linux code set, but it may be matched with a different symbol in each.

This appendix explains how to set up your DOS and Linux environments so you get the most consistent behavior when you combine the DOS and Linux operating systems. ``Converting text files'' also describes the Win4Lin dos2unix, unix2dos, and charconv programs, which you can use to convert text files in a variety of different ways.

How DOS handles NLS

Different kinds of code pages

DOS recognizes two kinds of code pages. A hardware code page is built into a hardware device. A software code page is provided in software form and stored in code page information (.cpi) files.

In the U.S., hardware devices commonly use code page 437 by default. Hardware devices designed for use in other countries use other code pages by default.

Not all devices can recognize all software code pages. Some printers, for instance, can print only the symbols contained in their hardware code pages. Thus, a particular character may be stored in the computer's memory as part of a software code page, but the attached printer may be unable to print that symbol correctly when it is sent. Some DOS monitors are limited in the characters they can display for the same reason. See ``Display considerations'' for further details.

Not all code pages and code sets are supported for every operation. A particular character set may be viewable on the console, for instance, but not viewable at an attached terminal or printable by the printers attached.

Code page switching

DOS allows you to work alternately in several languages on the same machine by switching code pages. You can work first in the characters of one language, then switch code pages to work in the characters of another language. Each device affected must be prepared ahead of time with the DOS MODE command. The actual switching is done with the CHCP (change code page) command.

Tailoring DOS for different languages

The following list summarizes the procedures required to set up a standard DOS computer for a different character set. Refer to DOS documentation for more complete information about following these procedures on a conventional DOS computer. Specific examples of these procedures as they apply to the Win4Lin environment appear later in this appendix.

COUNTRY
Use COUNTRY to specify the country. Place this command in the config.sys file. The command sets country-specific variables, such as the formats used for dates, times, and currencies. It also establishes the character collation sequence.

KEYB
Use the KEYB command to map the keyboard.

DEVICE
Use DEVICE= statements in the config.sys file to configure system devices. These statements tell DOS the hardware and software code pages to associate with each device.

NLSFUNC
Place the NLSFUNC command in the autoexec.bat file to load memory-resident NLS code. You must do this before you try to specify code pages or keyboard codes.

CODEPAGE PREPARE
Place the command:
    MODE device CODEPAGE PREPARE
in the autoexec.bat file. This prepares code pages for those devices that support code-page switching.

CHCP
Use the CHCP command to switch between code pages.

How the Linux operating system handles NLS

A Linux operating system must perform the same basic activities described above for DOS, but the system interface is different. (Refer to your Linux documentation for how to configure NLS for your Linux system. )

The Linux locale command prints out the current NLS setup. Win4Lin uses the LANG setting to determine the language and character set. LANG is of the form:

LANG=language_territory.codeset

For instance, LANG=fr_FR.8859-1 sets the language to French as spoken in France, and uses ISO Latin-1 (ISO 8859-1) as the character set. All dates, times, and currencies are expressed as is customary in that region.

NLS features of Win4Lin

Display considerations

A VGA console can display any character set for which the DOS font files are loaded. The nature of your display, however, limits the availability of automatic character conversion and the other NLS features offered by Win4Lin.

With Win4Lin you run DOS in a window on your X display, using X fonts to display characters. So the availability of X fonts for certain DOS code pages limits which DOS code can be fully used. When Win4Lin is installed, at least the fonts for code sets 437 (U.S.) and 850 (Western European) are installed.

The DOS X font files that Win4Lin installs are in the directory /opt/win4lin/xc/fonts.

Win4Lin automatically attempts to use the font that matches the current DOS codepage setting. When such a font is not installed a default font is used which might nor might not be suitable.

If you have them, you can install new DOS X fonts for Win4Lin to use. This is done by naming the font according to this scheme: SSSSpcCCC where SSSS is the size ("6x13" for small and "8x14" for medium) and CCC is the code page number. So for example if you have a 8x14 font for DOS code page 861, the font should be installed as "8x14pc861".

Setting up the Win4Lin NLS environment

Win4Lin automatically configures the DOS NLS environment to match your Linux NLS settings. Because an exact match between DOS and Linux NLS environments is not possible, Win4Lin makes some assumptions when creating your DOS NLS environment. You can adjust these settings, if necessary, by making changes to Win4Lin's DOS NLS configuration data.

You can view or modify the current setting using the Locale Settings window of Win4Lin Setup. You must be logged in as root to change NLS settings; otherwise, you can only view the current settings.

Follow these steps to access the Locale Setting window:

  1. Invoke the Win4Lin Setup utility.

  2. Click System-Wide Win4Lin Administration and then click OK.

  3. Click View/Modify Locale settings.

The current settings are shown. On the left is a list of Locales you can choose from, and on the right are the current NLS settings for Country, Codepage, and Keyboard:

When Win4Lin is installed on a U.S. English system, or on a system for which it cannot automatically determine the Linux NLS configuration, it is automatically configured for no DOS NLS support. In this case, the locale No Locale is highlighted, the Country and Codepage are zero, and there is no Keyboard setting.

Click OK to save any change you have made. If you change from No Locale to another setting, there will be a pause while Win4Lin rebuilds some files when you click OK. This process only takes a short time. New settings take effect for any new DOS or Windows sessions.

You can override these NLS settings by setting one or all of the Linux environment variables COUNTRY, CODEPAGE, and KEYB. This way a user can have a different setting from the global setting. However, for this to work completely, the global setting cannot be No Locale.

The COUNTRY and KEYB variables are set to the same values shown in the Locale Settings window for Country and Keyboard. The CODEPAGE variable is set to the same value as the Codepage setting but with the two letters pc in front. E.g., pc437.

LANG for DOS/WIndows

Win4Lin automatically sets LANG in the DOS/Windows environment based on the Linux locale LANG setting. (But if the locale is the "C" or "POSIX" locale then LANG is NOT automatically set in the DOS/Windows environment.)

If this automatic setting is not done correctly for your locale, you can override it by putting the setting you want in the Linux WIN_LANG environment variable. (This is also the way to have different users on the same system use different LANG settings in their Win4Lin sessions.) You can also make this the default for all users by putting this setting in the file "/etc/default/merge".

For example if in your DOS or Windows session you want to force the LANG setting to be "fr_CH" for all users you would put this line in "/etc/default/merge":

	WIN_LANG=fr_CH

Windows 95/98 keyboard setup

For Windows 95/98, each installation of it needs to have the keyboard configured. You do this the normal Windows 9x way, and use Keyboard control panel on a normal Windows 9x machine.

WARNING:

File name considerations

The section "About drives and file systems" in Chapter 3 describes how Win4Lin treats DOS and Linux file names, including the Win4Lin file name mapping feature. The file name mapping that can take place can be confusing, but where DOS or Windows uses a different code set than Linux for file names, it can get even more complicated: Therefore, you can avoid these potential pitfalls by only using characters that all DOS code pages and Linux code sets contain for your file names. These characters are a to z, 0 to 9, the dot (.), the hyphen (-), and the underscore (_).

Printing in an NLS environment

By default, Win4Lin sends DOS printer output to the Linux printing system.

Win4Lin by default does not translate DOS text when you print. If the Linux printer used for DOS printing does not support all the characters you send for printing, the results are unpredictable.

You cannot use the DOS MODE or CHCP commands to change the printer code page when you use a Linux printer. However, when you use the Linux print spooler, you can use the Win4Lin charconv command to translate DOS text files before you print them. For example, assume your DOS text file memo was created using DOS code page 850; you want to print it using the Linux ISO standard code set (8859) and convert characters that do not exist in code set 8859 to the best multibyte approximation. You can convert memo using the command:

charconv /i pc850 /o 8859 /m /d memo memo.unx

You can then print the converted file using standard DOS or Linux commands (for example, copy memo.unx lpt1).

You can also use the Win4Lin printer command to change the default Linux print command that Win4Lin uses when you send DOS printer output to the Linux spooler. For example, if you want to use the charconv command shown above whenever you print from DOS, you can issue the following command at the DOS prompt or in your autoexec.bat file:

printer unix "charconv -i pc850 -o 8859 -m -d | lp"

After you issue this printer command, all text files you print from DOS are converted automatically.


NOTE: The examples using the charconv and printer unix commands in this section are relevant only for text files. You cannot convert DOS graphics or other non-text files before printing.

Refer to ``Converting text files'' for more information on the charconv command. See ``Using the printer command'' in Chapter 6 for more information on the printer command.

You may also have to use special options for the Linux "lp" (or "lpr") command you use to print the file. For example on UnixWare 7 in Germany you use the "-L de" option to specify the German (de) locale, and you might need to use the "-D de" option for printing text files. The point is that the default printing configuration is likely to need various adjustments to properly print, and you will have to figure out these locale specific adjustment yourself.

Instead of using the Linux print spooler, you can attach a printer directly to your DOS process. When a printer is directly attached to DOS, you can use the DOS MODE or CHCP commands to change printer code pages and use the printer in all other ways exactly as you would with a conventional DOS computer. See ``Configuring printers for direct attachment'' in Chapter 5 for further information on setting up and using a directly attached printer.

Converting text files

Win4Lin has three commands that you can use to convert text files so they are usable with different code pages and code sets: dos2unix, unix2dos, and charconv. You can use these commands both in the DOS environment and at the Linux shell prompt.

Using dos2unix and unix2dos

``Working with DOS and Linux files'' describes how to use dos2unix and unix2dos to convert text in DOS format to Linux format and Linux to DOS format. In addition, dos2unix and unix2dos by default translate each character in your text from the DOS code page to the corresponding character in the Linux code set or the reverse. When you use these commands in the DOS environment, Win4Lin translates between your current DOS code page and the Linux code set defined by the LANG locale setting. When you use them at the Linux shell, Win4Lin translates between the DOS code page defined by the CODEPAGE environment variable and the Linux code set.

When all characters in your text file exist in both the code page and the code set, the converted output looks exactly the same as your original, unconverted text file (except that it is now usable with a different operating system). When your original text file contains characters that do not exist in both the code page and code set, dos2unix and unix2dos by default convert untranslatable characters into asterisks (*).

You can modify the behavior of dos2unix and unix2dos by using the same options that apply to the charconv command, described in the next section.

Using charconv

The charconv command is a flexible and powerful text conversion tool. charconv is most useful in multilanguage (NLS) environments. (For simple file conversion cases, use dos2unix and unix2dos.) Unlike dos2unix or unix2dos, charconv has no defaults for code pages and code sets. The -i and -o options are required unless you use the -x option.

The syntax of the charconv command is:

charconv [options] sourcefile [targetfile]

The source and target files must not be the same. When neither file parameter is specified, charconv prints a usage message to your screen. If you specify only one file parameter, that file is considered the source file, and the output is written to standard output.

charconv options

When issued from the Linux command line, charconv options must be specified with a hyphen ( - ) rather than a slash ( / ). In the DOS environment, you can use either the slash or the hyphen. The examples in this appendix show the hyphen, which will work in either environment. You can use either uppercase or lowercase options both from the Linux shell and in the DOS environment.

All of the options below can be set on the command line or with the CONVOPTS environment variable, which is described later.

-?
Displays a list of valid charconv options on your screen.

-7
Notifies you if charconv encounters any 8-bit characters and translates them to 7-bit characters. You cannot use this option with -b.

-a
Causes charconv to quit if it encounters a character it cannot translate accurately. This option overrides the -c, -m, and -s options, which allow inexact translations.

-b
Preserves 8-bit (binary) character representations. This option is on by default.

-c x
Specifies x as the character to use for untranslatable characters. The default character is an asterisk (*).

-d
Converts a file from DOS to Linux format by removing carriage returns from the end of each line.

-i tbl
Identifies the code page or code set used for the input file. This option is required unless you use the -x option.

-o tbl
Identifies the code page or code set used for the output file. This option is required unless you use the -x option.

For example, if you wanted to convert a DOS file from code page 437 to code page 850, use the charconv command as follows:

charconv -i pc437 -o pc850 sourcefile targetfile

-l
Converts text to lowercase.

-m
Converts single, untranslatable characters into multibyte characters, if possible. For example, ¾ is converted to 3/4. (If a multibyte conversion is not possible, charconv uses the default untranslatable character -- see the -c option).

-p
Converts a text file from Linux to DOS format by adding carriage returns to the end of each line.

-q
Quiet mode prevents charconv from printing warning messages and character translation statistics to your screen. This is the default. Use -q if necessary to override the -v option, which causes warning messages and conversion statistics to be displayed.

-s
Converts single, untranslatable characters into the best single-character translation. For example, the DOS graphic character for upper left corner is converted to +. (If there is no best single translation, charconv uses the default untranslatable character -- see the -c option).

-u
Converts text to uppercase.

-v
Displays warning messages and character conversion statistics on your screen.

-x
Specifies that charconv should not translate from code page to code set or the reverse. Use this command with the -d or -p options when you want to translate between the DOS and Linux operating systems without changing code pages.

When you specify -x, charconv does not allow you to use the -i or -o options.

-z
Causes charconv to stop processing when it encounters the DOS end-of-file (EOF) character, ^Z. This option removes the end-of-file marker when text is converted from DOS to Linux. By default, charconv converts the whole file.

Using the CONVOPTS environment variable

The CONVOPTS environment variable allows you to set charconv options once so you only need to type charconv file name to accomplish a task.

For example, suppose you want to convert a series of files from code page 437 to code page 850 and make all of the characters uppercase.

If you use charconv in the DOS environment, set the CONVOPTS environment variable on the DOS command line as follows:

set convopts=/u /i pc437 /o pc850

If you use charconv at the Linux shell, define the CONVOPTS variable as follows (this example assumes you use the Bourne shell):

CONVOPTS="-u -ipc437 -opc850"; export CONVOPTS

Now, you can just type the following line for each file:

charconv sourcefile targetfile

and charconv automatically converts from code page 437 to 850 and makes all characters uppercase.


> > Next Appendix - Filename mapping > >