home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Chip 1998 February
/
CHIP_2_98.iso
/
doc
/
HOWTO
/
mini
/
Locales
< prev
next >
Wrap
Text File
|
1997-07-31
|
19KB
|
485 lines
Locales mini-HOWTO
Peeter Joot, peeter_joot@vnet.ibm.com
v1.5, 21 July 1997
This document describes how to set up your Linux machine to use
locales.
1. Introduction
This is really a description of what I had to do to get localedef
installed, compile some locales, and try them out. I did this just
for fun, and thought that perhaps some people would be interested in
trying it out themselves. Once it is set up you should be able to use
NLS enabled applications with the locale of your choice. After a
while, locale support should be part of the standard distributions,
and most of this mini-HOWTO will be redundant.
2. What is a "locale" anyhow?
Locales encapsulate some of the language/culture specific things that
you shouldn't hard code in your programs.
If you have various locales installed on your computer then you can
select via the following list of environment variables how a locale
sensitive program will behave. The default locale is the C, or POSIX
locale which is hard coded in libc.
LANG
This sets the locale, but can be overridden with any other
LC_xxxx environment variables
LC_COLLATE
Sort order.
LC_CTYPE
Character definitions, uppercase, lowercase, ... These are used
by the functions like toupper, tolower, islower, isdigit, ...
LC_MONETARY
Contains the information necessary to format money in the
fashion expected. It has the definitions of things like the
thousands separator, decimal separator, and what the monetary
symbol is and how to position it.
LC_NUMERIC
Thousands, and decimal separators, and the numeric grouping
expected.
LC_TIME
How to specify the time, and date. This has the things like the
days of the week, and months of the year in abbreviated, and non
abbreviated form.
LC_MESSAGES
Yes, and No expressions.
LC_ALL
This sets the locale, and overrides any other LC_xxxx
environment variables.
Here are some other locales, and there are lots more.
en_CA
English Canadian.
en_US
US English.
de_DE
Germany's German.
fr_FR
France's French.
If you are writing a program, and want to to be usable internationally
you should utilize locales. The most glaring reason for this is that
not everybody is going to use the same character set/code page as you.
Make sure in your programs that you don't do things like:
/* check for alphabetic characters */
if ( (( c >= 'a') && ( c <= 'z' )) ||
(( c >= 'A') && ( c <= 'Z' )) ) { ... }
If you write that type of code your program assumes that the
user/file/... is ASCII and nothing but ASCII, and it does not respect
the code page definitions of the user's locale. For example it
preludes characters such as a-umelaut which would be used in a German
environment. What you should do instead is use the locale sensitive
functions like isalpha(). If your program does expliticly require use
of only US-ASCII alphabetics, you still use the isalpha() function,
but you must also either do setlocale(LC_CTYPE,"C") or set the LANG,
LC_CTYPE, or LC_ALL environment variables to "C".
Locales allow a large degree of flexibility and make certain
assumptions that a programmer may have made in ASCII based C programs
invalid.
For instance, you cannot assume the code positions of characters.
There is nothing stopping you from creating a charmap file that
defines the code position of 'A' to be 0xC1 rather than 0x41. This is
in fact the code point mapping for 'A' in IBM code page 37, used on
mainframes, while the former is used for US-ASCII, iso8859-x, and
others.
The basic idea is different people speak different languages, expect
different sorting orders, use different code pages, and live in
different countries. Locales and locale sensitive programs give one a
means to respect such things, and handle them accordingly. It is not
really much extra work to do so, it just requires a slightly different
frame of mind when writing programs.
3. Notes.
╖ In order to set up locales on my machine I had to upgrade a few
things. Apparently ftp.tu-clausthal.de:/pub/linux/SLT/nls contains
a a.out version of locale and localedef (in the file
nlsutils-0.5.tar.gz), so if you don't have an ELF system, or don't
want one you can use the above. There is probably a copy of the
nlsutils package some other place, but I have not looked for it. I
hadn't known that there was a stand alone version of locale and
localedef, and kind of figured that you would have to have the
corresponding libc installed. Because of this a lot of this HOWTO
is just a log of what I had to do to upgrade libc and family. If
you do this, as I have you, will need to be running an ELF system,
or upgrade to one as you set up your locales.
╖ The sorts of system upgrades that I did are the same sort of
upgrades that have to be done to upgrade from a.out to ELF. If you
haven't done this, or if you have upgraded to ELF by reinstalling
Linux then you should get the resent ELF HOWTO from a sunsite
mirror. This is an excellent guide, and gives additional guidance
for installing libc, ld.so, and other ELF system upgrades.
╖ For anything that you install, read the appropriate release notes,
or README type files. If you mess up your system by
misinterpreting something that I say here, or ( hopefully not ) by
doing something that I say in here, please don't blame me.
╖ Mis-installing a new libc, and ld.so, could leave you with an
unbootable system. You probably ought to have a boot disk handy,
and make sure any critical, non-replaceable, data is backed up.
4. What you need.
A few things need to be down loaded from various places. Everything
here except for the locale source files can be obtained from
sunsite.unc.edu, tsx-11.mit.edu, or, preferably, a local mirror of
these sites. When I did this originally I used libc-5.2.18, which is
now quite out of date. As of now I have been told that the current
libc is 5.4.17, and this substitution has been made below. However,
libc 5.4.17, will likely be old before you can blink, so just use the
lastest version when you do this.
You may want to consider using glibc (gnu libc) rather than Linux libc
5 for any internationalization work. As of now glibc 2.0.4 (gnu libc)
is available but no distributions have started using it as the
standard libc yet (at least for Intel based Linux distributions). As
well as being fully reentrant and having built in threading support,
glibc is fully internationalized and has excellent
internationalization support for programming. What
internationalization has been done in libc 5 has been mostly taken
from glibc. The locales and charmaps for glibc are bundled with the
the glibc locale add on.
If you opt for using glibc then you can ignore this mini-howto.
Including the locale add on in the glibc compilation and installation
is trivial, and is covered in the glibc installation documentation.
Be warned that a full upgrade is not a trivial job! I am hoping that
redhat (which I use) will have a glibc based release soon, as I am not
inclined to recompile my entire system.
╖ locale, and charmap sources --- These are what you compile using
localedef.
╖ libc-5.4.17.bin.tar.gz --- the ELF shared libraries for the c and
math libraries. Note that the precompiled program localedef for
libc.5.4.17 is apparently corrupt and creates LC_CTYPE with invalid
magic number. This probably means that an older localedef got into
the binary distribution.
╖ libc-5.4.17.tar.gz --- the source code for the ELF shared
libraries. You may need this to compile localedef.
╖ make-3.74.tar.gz --- you may need to compile make to incorporate a
patch to fix the dirent bug.
╖ release.libc-5.2.18 --- these release notes have the patch to make
make. it's been a while since this make bug happened, and it is
likely that you don't have to worry about it.
╖ ld.so-1.7.12+ --- the dynamic linker.
╖ ELF gcc-2.7.2+ --- to compile things.
╖ an ELF kernel ( eg. 2.0.xx ) --- to compile things.
╖ binutils 2.6.0.2+ --- to compile things.
There are probably lots of places that you can get locale sources. I
have found public domain locale and charmap sources at
dkuug.dk:/i18n/WG15-collection/locales
<ftp://dkuug.dk/i18n/WG15-collection/locales> and
dkuug.dk:/i18n/WG15-collection/charmaps
<ftp://dkuug.dk/i18n/WG15-collection/charmaps> respectively.
5. Installing everything.
This is what I did to install everything. I already had an ELF system
( compiler, kernel, ... ) installed before I did this.
1. First I installed the binutils package. tar xzf
binutils-2.6.0.2.bin.tar.gz -C /
2. Next I installed the dynamic linker:
tar zxf ld.so-1.7.12.tar.gz -C /usr/src
cd /usr/src/ld.so-1.7.12
sh instldso.sh
3. Next I installed the libc binaries. See release.libc-5.4.17 for
more instructions.
rm -f /usr/lib/libc.so /usr/lib/libm.so
rm -f /usr/include/iolibio.h /usr/include/iostdio.h
rm -f /usr/include/ld_so_config.h /usr/include/localeinfo.h
rm -rf /usr/include/netinet /usr/include/net /usr/include/pthread
tar -xzf libc-5.4.17.bin.tar.gz -C /
4. Now ldconfig must be run to locate the new shared libraries.
ldconfig -v.
5. There is a bug that was fixed in libc that breaks make, and some
other programs. Here is what I did in order to rebuild and install
make.
tar zxf make-3.74.tar.gz -C /usr/src
cd /usr/src/make-3.74
patch < /whereever_you_put_it/release.libc-5.4.17
configure --prefix=/usr
sh build.sh
./make install
cd ..
rm -rf make-2.74
6. Now localedef can be compiled and installed.
mkdir /usr/src/libc
tar zxf libc-5.4.17.tar.gz -C /usr/src/libc
cd /usr/src/libc
cd include
ln -s /usr/src/linux/include/asm .
ln -s /usr/src/linux/include/linux .
cd ../libc
./configure
# I am not sure if these two makes are necessary, but just to be safe :
make clean ; make depend
cd locale
make programs
mv localedef /usr/local/bin
mv locale /usr/local/bin
7. Put the charmaps where localedef will find them. This uses the
charmaps and locale sources which I down loaded from dkuug.dk ftp
site as charmaps.tar, and locales.tar respectively. The older
localedef (5.2.18) looked in /usr/share/nls/charmap for charmap
sources, but now localedef looks in /usr/share/i18n/charmaps and
/usr/share/i18n/locales by default for the charmap and locale
sources:
mkdir /usr/share/i18n
mkdir /usr/share/i18n/charmaps
mkdir /usr/share/i18n/locales
tar xf charmaps.tar -C /usr/share/i18n/charmaps
tar xf locales.tar -C /usr/share/i18n/locales
The newer localedef (5.4.17) has been made smarter and will look for
other locale source files when handling the `copy' statement, whereas
the older localedef needed to have the locale objects already created
in order to handle the copy statement. This list of commands has the
dependencies sorted out and can be used to generate all the locale
objects regardless of which libc version is being used, but you should
now be able to create only the ones that you wish.
localedef -ci en_DK -f ISO_8859-1:1987 en_DK
localedef -ci sv_SE -f ISO_8859-1:1987 sv_SE
localedef -ci fi_FI -f ISO_8859-1:1987 fi_FI
localedef -ci sv_FI -f ISO_8859-1:1987 sv_FI
localedef -ci ro_RO -f ISO_8859-1:1987 ro_RO
localedef -ci pt_PT -f ISO_8859-1:1987 pt_PT
localedef -ci no_NO -f ISO_8859-1:1987 no_NO
localedef -ci nl_NL -f ISO_8859-1:1987 nl_NL
localedef -ci fr_BE -f ISO_8859-1:1987 fr_BE
localedef -ci nl_BE -f ISO_8859-1:1987 nl_BE
localedef -ci da_DK -f ISO_8859-1:1987 da_DK
localedef -ci kl_GL -f ISO_8859-1:1987 kl_GL
localedef -ci it_IT -f ISO_8859-1:1987 it_IT
localedef -ci is_IS -f ISO_8859-1:1987 is_IS
localedef -ci fr_LU -f ISO_8859-1:1987 fr_LU
localedef -ci fr_FR -f ISO_8859-1:1987 fr_FR
localedef -ci de_DE -f ISO_8859-1:1987 de_DE
localedef -ci de_CH -f ISO_8859-1:1987 de_CH
localedef -ci fr_CH -f ISO_8859-1:1987 fr_CH
localedef -ci en_CA -f ISO_8859-1:1987 en_CA
localedef -ci fr_CA -f ISO_8859-1:1987 fr_CA
localedef -ci fo_FO -f ISO_8859-1:1987 fo_FO
localedef -ci et_EE -f ISO_8859-1:1987 et_EE
localedef -ci es_ES -f ISO_8859-1:1987 es_ES
localedef -ci en_US -f ISO_8859-1:1987 en_US
localedef -ci en_GB -f ISO_8859-1:1987 en_GB
localedef -ci en_IE -f ISO_8859-1:1987 en_IE
localedef -ci de_LU -f ISO_8859-1:1987 de_LU
localedef -ci de_BE -f ISO_8859-1:1987 de_BE
localedef -ci de_AT -f ISO_8859-1:1987 de_AT
localedef -ci sl_SI -f ISO_8859-2:1987 sl_SI
localedef -ci ru_RU -f ISO_8859-5:1988 ru_RU
localedef -ci pl_PL -f ISO_8859-2:1987 pl_PL
localedef -ci lv_LV -f BALTIC lv_LV
localedef -ci lt_LT -f BALTIC lt_LT
localedef -ci iw_IL -f ISO_8859-8:1988 iw_IL
localedef -ci hu_HU -f ISO_8859-2:1987 hu_HU
localedef -ci hr_HR -f ISO_8859-4:1988 hr_HR
localedef -ci gr_GR -f ISO_8859-7:1987 gr_GR
6. Now what.
After doing all the stuff above you should now be able to use the
locales that have been created. Here is a simple example program.
/* test.c : a simple test to see if the locales can be loaded, and
* used */
#include <locale.h>
#include <stdio.h>
#include <time.h>
main(){
time_t t;
struct tm * _t;
char buf[256];
time(&t);
_t = gmtime(&t);
setlocale(LC_TIME,"");
strftime(buf,256,"%c",_t);
printf("%s\n",buf);
}
You can use the locale program to see what your current locale
environment variable settings are.
$ # compile the simple test program above, and run it with
$ # some different locale settings
$ gcc -s -o Test test.c
$ # see what the current locale is :
$ locale
LANG=POSIX
LC_COLLATE="POSIX"
LC_CTYPE="POSIX"
LC_MONETARY="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_MESSAGES="POSIX"
LC_ALL=
$ # Ho, hum... we're using the boring C locale
$ # let's change to English Canadian:
$ export LC_TIME=en_CA
$ Test
Sat 23 Mar 1996 07:51:49 PM
$ # let's try French Canadian:
$ export LC_TIME=fr_CA
$ Test
sam 23 mar 1996 19:55:27
7. catopen bug fix.
Installing the locales fixes a bug (feature ?) that is in the catopen
command in Linux libc. Say you create a program that uses message
catalogs, and you create an German catalog and put it in
/home/peeter/catalogs/de_DE.
Now upon doing the following, without the de_DE locale installed :
export LC_MESSAGES=de_DE
export NLSPATH=/home/peeter/catalogs/%L/%N.cat:$NLSPATH
the German message catalog does not get opened, and the default mes¡
sages in the catgets calls are used.
This is because catopen does a setlocale call to get the right message
category, the setlocale fails even though the environment variable has
been set. catopen then attempts to load the message catalog
substituting "C" for all the "%L"'s in the NLSPATH.
You can still use your message catalog without installing the locale,
but you would have to explicitly set the "%L" part of the NLSPATH like
export NLSPATH=/home/peeter/catalogs/de_DE/%N.cat:$NLSPATH
, but this defeats the whole purpose of the locale catagory environ¡
ment variables.
8. Questions and Answers.
This section could grow into a FAQ, but isn't really one yet.
8.1. msgcat question
I am an user of LINUX, and have written the following test program:
--------------------------------------------------------------------
#include <stdio.h>
#include <locale.h>
#include <features.h>
#include <nl_types.h>
main(int argc, char ** argv)
{
nl_catd catd;
setlocale(LC_MESSAGES, "");
catd = catopen("msg", MCLoadBySet);
fprintf(stderr,catgets(catd, 1, 1, "locale message fail\n"));
catclose(catd);
}
--------------------------------------------------------------------
$ msg.m
$set 1
1 locale message pass\n
--------------------------------------------------------------------
If I use absolute path in catopen like
catopen("/etc/locale/msg.cat",MCLoadBySet); ,I got the right result.
But,if I use above example,catopen return -1 (failure).
8.2. msgcat answer
This question is sort of answered in the previous section, but here is
some additional information.
There are a number of valid places where you can put your message
catalogs. Even though you may not have NLSPATH explicitly defined in
your environment settings it is defined in libc as follows :
$ strings /lib/libc.so.5.4.17 | grep locale | grep %L
/etc/locale/%L/%N.cat:/usr/lib/locale/%L/%N.cat:/usr
/lib/locale/%N/%L:/usr/share/locale/%L/%N.cat:/usr/
local/share/locale/%L/%N.cat
so you if you have done one of :
$ export LC_MESSAGES=en_CA
$ export LC_ALL=en_CA
$ export LANG=en_CA
With the NLSPATH above and the specified environment , the
catopen("msg", MCLoadBySet); should work if your message catalog has
been copied to any one of :
/etc/locale/en_CA/msg.cat
/usr/lib/locale/en_CA/msg.cat
/usr/lib/locale/msg/en_CA
/usr/share/locale/en_CA/msg.cat
/usr/local/share/locale/en_CA/msg.cat
This, however, will not work if you don't have the en_CA locale
installed because the setlocale will fail, and "C" will be substituted
for "%L" in the catopen routine ( rather than "en_CA" ).
9. More information.
Well that's it. Hopefully this guide has been some help to you.
There are probably lots of places that you can look for additional
information on writing locale sensitive programs, and documents on
internationalization, and localization in general. I'll bet that if
you browse the web a bit you will be able to find a lot of info.
Ulrich Drepper who implemented much of the gnu internationalization
code has some information about internationalization and localization
on his home page <http://i44www.info.uni-karlsruhe.de/~drepper>, and
you can look there to start. There is also some information in the
info pages for libc, and of course, there are always man pages.