It appears that $0D is a carriage return, and $09 is a tab, even in Kanji (the
Kanji character set seems to encompass the entire Roman character set), if they
occupy the first byte of a double-byte character. However, we are finding what
appear to be control characters in the second bytes of double-byte characters.
Can we safely assume that $0D, or any other common control-character code, is a
carriage return wherever we find one in a Kanji document? If so, does this hold
true for other non-Roman script systems?
A All scripts have the same low ASCII values ($00-$7F), and all double-byte
scripts use only high ASCII values ($80-FF) for high-byte (first byte) values
and $40-$FF for low-byte (second byte) values. Therefore, control characters,
numbers, and elementary punctuation characters are all unique.
To see exactly what is permitted for the particular script you are working with, call the parseTable script-manager routine to obtain a table of high/low byte values. The difficulty of dealing with control characters in scripts will disappear when Unicode (which uses 16-bit characters and can have any combination of them) is in widespread use. Because of fundamental compatibility problems with our system software and any application that assumes that $0D is always a <CR>, Unicode will never be a 'script system'. Instead, it will probably be an alternate encoding platform, with all new rules. There is no reason to plan extensively for Unicode at the present time, but you should make as few assumptions as possible in your code. This will help to minimize the effort required to make your code compatible Unicode in the future.
To obtain a more in-depth understanding of international character-set
encoding, software localization, and Unicode, locate a copy of Guide to
Macintosh Software Localization (an Addison-Wesley publication). While this
is available as soft copy on one of the developer CDs, you may find that some
of the content won't display properly unless you have all of the appropriate
fonts installed, so it might be best to obtain a printed copy.