Next | Prev | Up | Top | Contents | Index

Character Sets, Codesets, and Encodings

One major difference between nationalized and internationalized software is the availability in internationalized software of a wide variety of methods for encoding characters. Developers of internationalized software no longer have the convenience of always being able to assume ASCII. Three terms that describe groupings of characters are the following:

character setAn abstract collection of characters.
codesetA character set with exactly one associated numerical encoding for each character. The English alphabet is a character set; ASCII is a codeset.
encodingA set of characters and associated numbers; however, this term is more general than "codeset." A single encoding may include multiple codesets; Extended UNIX Code (EUC), for instance, is an encoding that provides for four codesets in one data stream.

This section describes these topics:

For information on installing and using fonts with an application, refer to Chapter 5, "Working With Fonts."


Eight-Bit Cleanliness
Character Representation
Multibyte Characters
Wide Characters
Reading Input Data

Next | Prev | Up | Top | Contents | Index