Mac OS X Reference Library Apple Developer
Search

File Encodings and Fonts

Unicode is generally considered the native encoding for Mac OS X and should be used in nearly all situations. Previous versions of Mac OS supported file encodings such as MacRoman but most modern Mac OS X libraries support Unicode inherently. If you use Cocoa or Core Foundation routines, then you will probably never need to worry about other file encodings. If your software supports legacy file formats, however, you might need to consider file encoding issues when importing legacy file formats. The following sections describe some of the issues related to Unicode support and legacy file encodings.

File Systems and Unicode Support

Different file systems in Mac OS X have different levels of Unicode support:

Locking the canonical decomposition to a particular version of Unicode does not exclude usage of characters defined in a newer version of Unicode. Because the Unicode consortium has guaranteed not to add any more precomposed characters, applications can expect to store characters defined in future versions of Unicode without compatibility issues.

Note: Because of implementation differences, erroneous Unicode in filenames on HFS+ volumes may display correctly when entered on Mac OS 9 but appear garbled on Mac OS X. Similarly, erroneous Unicode entered on Mac OS X may appear garbled in Mac OS 9.

All BSD system functions expect their string parameters to be in UTF-8 encoding and nothing else. Code that calls BSD system routines should ensure that the contents of all const *char parameters are in canonical UTF-8 encoding. In a canonical UTF-8 string, all decomposable characters are decomposed; for example, é (0x00E9) is represented as e (0x0065) + ´ (0x0301). To put things into a canonical UTF-8 encoding, use the “file-system representation” interfaces defined in Cocoa and Carbon (including Core Foundation).

Getting Canonical Strings

Both Cocoa and Core Foundation provide routines for accessing canonical and non-canonical Unicode strings. Cocoa string manipulations are all handled through the NSString class and its subclasses. In Core Foundation, you can use the CFStringGetCString and CFStringGetCStringPtr functions to obtain a C string with the desired encoding.

Cocoa Issues

Cocoa employs Unicode for character encoding, making any Cocoa application capable of displaying most human languages. Although Cocoa supports vertical and bidirectional text, the NSTypesetter class only supports layout for horizontal text. If you want to lay out vertical text, you need to define your own custom typesetter class.




Last updated: 2009-08-07

Did this document help you? Yes It's good, but... Not helpful...