Previous Next Contents

2. What is a "locale" anyhow?

Locales encapsulate some of the language/culture specific things that you shouldn't hard code in your programs.

If you have various locales installed on your computer then you can select via the following list of environment variables how a locale sensitive program will behave. The default locale is the C, or POSIX locale which is hard coded in libc.

LANG

This sets the locale, but can be overridden with any other LC_xxxx environment variables

LC_COLLATE

Sort order.

LC_CTYPE

Character definitions, uppercase, lowercase, ... These are used by the functions like toupper, tolower, islower, isdigit, ...

LC_MONETARY

Contains the information necessary to format money in the fashion expected. It has the definitions of things like the thousands separator, decimal separator, and what the monetary symbol is and how to position it.

LC_NUMERIC

Thousands, and decimal separators, and the numeric grouping expected.

LC_TIME

How to specify the time, and date. This has the things like the days of the week, and months of the year in abbreviated, and non abbreviated form.

LC_MESSAGES

Yes, and No expressions.

LC_ALL

This sets the locale, and overrides any other LC_xxxx environment variables.

Here are some other locales, and there are lots more.

en_CA

English Canadian.

en_US

US English.

de_DE

Germany's German.

fr_FR

France's French.

If you are writing a program, and want to to be usable internationally you should utilize locales. The most glaring reason for this is that not everybody is going to use the same character set/code page as you.

Make sure in your programs that you don't do things like:

/* check for alphabetic characters */
if ( (( c >= 'a') && ( c <= 'z' )) ||
     (( c >= 'A') && ( c <= 'Z' )) ) { ... }

If you write that type of code your program assumes that the user/file/... is ASCII and nothing but ASCII, and it does not respect the code page definitions of the user's locale. For example it preludes characters such as a-umelaut which would be used in a German environment. What you should do instead is use the locale sensitive functions like isalpha(). If your program does expliticly require use of only US-ASCII alphabetics, you still use the isalpha() function, but you must also either do setlocale(LC_CTYPE,"C") or set the LANG, LC_CTYPE, or LC_ALL environment variables to "C".

Locales allow a large degree of flexibility and make certain assumptions that a programmer may have made in ASCII based C programs invalid.

For instance, you cannot assume the code positions of characters. There is nothing stopping you from creating a charmap file that defines the code position of 'A' to be 0xC1 rather than 0x41. This is in fact the code point mapping for 'A' in IBM code page 37, used on mainframes, while the former is used for US-ASCII, iso8859-x, and others.

The basic idea is different people speak different languages, expect different sorting orders, use different code pages, and live in different countries. Locales and locale sensitive programs give one a means to respect such things, and handle them accordingly. It is not really much extra work to do so, it just requires a slightly different frame of mind when writing programs.


Previous Next Contents