Next | Prev | Up | Top | Contents | Index

Character Classification and ctype

The ctype.h header file is described in the ctype(3C) reference page and defines macros to determine various kinds of information about a given character: isalpha(), isupper(), islower(), isdigit(), isxdigit(), isalnum(), isspace(), ispunct(), isprint(), isgraph(), iscntrl(), and isascii().


The Issue

When programmers knew that a character set was ASCII, some convenient assumptions could be made about characters and letters. It was common for programmers to do arithmetic with the ASCII code values in order to perform some simple operations. For example, raising a character to upper case could be done by subtracting the difference between the code for a and the code for A. Numeric characters could be identified by inspection: if they fell between 0 and 9, they were numeric; otherwise, they weren't. You could tell if a character was (for instance) printable, a letter, or a symbol by comparing to known encoding values. Macros for such activity have long been available in ctype.h, but lots of programs did character arithmetic anyway. Since character encoding and linguistic semantics are completely independent, such arithmetic in an internationalized program leads to unpleasant results.

Furthermore, characters exist outside of ASCII that break some non-arithmetic assumptions. Consider the German character ß which is a lowercase alphabetic character (letter), yet has no uppercase. Consider also French (as written in France), where the uppercase of é is E, not É.

Clearly, the programmer of an internationalized application has no way of directly computing all the character associations that were available in English under ASCII.


The Solution

Strict avoidance of arithmetic on character values should remove any trouble in this area. The macros in ctype.h are table-driven and are therefore locale-sensitive. If you think of characters as abstract characters rather than as the numbers used to represent them, you can avoid pitfalls in this area.


Next | Prev | Up | Top | Contents | Index