Wide Characters

Wide Characters

A wide character (WC or wchar) is a data object of type wchar_t, which is guaranteed to be able to hold the system's largest numerical code for a character. wchar_t is defined in stdlib.h. Under IRIX 4.0.x, sizeof(wchar_t) was 1. In IRIX 5.1 and above, it is 4. All wchars on a system are the same size, independent of locale, encoding, or any other factors.

Uses for wchar Strings

The single advantage of WC strings is that all characters are the same size. Thus, a string can be treated as an array, and a program can simply index into the array in order to modify its contents. Most applications' char manipulation routines work with little modification other than a type change to wchar_t, with appropriate attention to byte count and sizeof().

So, when applications have significant string editing to perform, they typically keep the strings in WC format while doing that editing. Those WC strings may or may not be converted to or from MB strings at other points in the application.

Wide characters are often large and are not as space efficient as multibyte strings. Applications that do not need to perform string editing probably shouldn't use wchars. If an application intends to both maintain and edit large numbers of strings, then the developer needs to make size and complexity trade-off decisions.

Support Routines for Wide Characters

Analogs to the routines defined in string.h and stdio.h are supplied in libw.a and defined in widec.h. This includes routines such as getwchar(), putwchar(), putws(), wscpy(), wslen(), and wsrchr() (see the wcstring(3) reference page).

Conversion to MB Characters

Wide characters and strings are convertible to MB strings via wctomb() and wcstombs(), respectively.