home *** CD-ROM | disk | FTP | other *** search
- Guidelines for character mnemonics in a minimal character set.
-
- By Keld Simonsen, Danish UNIX User Group (DKUUG)
- Representative to SC22 WG on Character Set Usage
- for Danish Standards Association (DS), Denmark.
-
- Draft January 1991.
-
- Aim of Character Mnemonics
-
- The aim of the mnemonics is to be able to represent all characters
- in all standard coded character sets in any standard coded
- character set. Thus all standard coded character sets will be
- related, and a conversion can take place.
-
- The usage of the character mnemonics is primarily intended
- within computer operating systems, programming languages and
- applications, and this work with character mnemonics is the current
- state of work which has been presented to the ISO working group
- responsible for these computer related issues, namely the
- ISO/IEC JTC1/SC22 special working group on character set usage.
-
- Covered Coded Character Sets
-
- Almost all characters in the standard coded character sets have been
- given a mnemonic name in the minimal character set.
- The minimal character set is defined as the basic character set
- of ISO 646, where 12 positions are left undefined.
- The standard coded character sets are taken as the sum of
- all ISO defined or ISO registered character sets.
-
- The most significant ISO coded character set is the 10646 coded character
- set, whose aim is to code in 32 bits all characters in the world.
- These guidelines can be seen as assigning mnemonic attributes
- to most characters in 10646, currently at DIS stage.
-
- Other ISO coded character sets covered include all parts of
- ISO 8859, ISO 6937-2 and all ISO 646 conforming coded character
- sets in the ISO character set registry managed by ECMA
- according to ISO 4873.
- Some non-ISO character sets are also covered for convenience.
-
- The Character Mnemonics Classes
-
- The character mnemonics are classified into two groups:
-
- 1. A group with two-character mnemonics
- - Primarily intended for alphabetic scripts like Latin, Greek,
- Cyrillian, Hebrew and Arabic, and special characters.
- 2. A group with variable-length mnemonics
- - primarily intended for non-alphabetic scripts like Japanese
- and Chinese.
-
- All mnemonics are given a long descriptive name, written in the
- reference character set and taken from ISO 10646, if possible.
-
-
- The Two-Character mnemonics
-
- The two-character mnemonics include various accented Latin letters,
- Greek, Cyrillic, Hebrew, Arabic, Hiragana, Katakana and Bopomofo.
- Also quite some special characters are included.
- Almost all ISO or ISO registered 7- and 8-bit coded
- character sets are covered with these two-character mnemonics.
- Thus conversions between these character sets can be done via a
- two-character conversion table.
-
- The two characters are chosen so the graphical appearence in the
- reference set resembles as much as possible (within the posibilities
- available) the graphical appearance of the character. The basic character
- set of ISO 646 is used as the reference set, as mentioned above.
-
- The characters in the reference character set are chosen to represent
- themselves. You may consider them as two-character mnemonics where
- the second char is a space.
-
- Control characters mnemonics are chosen according to ISO 2047 and ISO 6429 .
-
- Letters, including Greek, Cyrillic, Arabic and Hebrew, are represented
- with the base letter as the first letter, and the second letter
- represents an accent or relation to a non-Latin script.
- Non-Latin letters are translitterated to Latin letters,
- following translitteration standards as closely as possible.
-
- After a letter, the second character signifies the following:
-
- Exclamation mark ! Grave
- Apostrophe ' Acute accent
- Greater-Than sign > Circumflex accent
- Question Mark ? tilde
- Hyphen-Minus - Macron
- Left parenthesis ( Breve
- Full Stop . Dot Above/Ring above
- Colon : Diaeresis
- Comma , Cedilla
- Underline _ Underline
- Solidus / Stroke
- Quotation mark " Double acute accent
- Semicolon ; Ogonek
- Less-Than sign < Caron
-
- Equals = Cyrillian
- Asterisk * Greek
- Percent sign % Greek/Cyrillian special
- Plus + smalls: Arabic, capitals: Hebrew
- Four 4 Bopomofo
- Five 5 Hiragana
- Six 6 Katakana
-
- The ampersand & is reserved as an intro character, indicating that the
- following string is in the mnemonic character set. This character
- could also be another character, e.g. in the control character set.
- One common choice in the control character set is decimal 29,
- which seems to have no effect on almost all current equipment.
- The intro character can be negotiated between the communicating parties,
- but the default is the ampersand "&". Two intro characters in a row
- signifies the intro character itself.
-
- The underscore is reserved for the variable-length mnemonics.
- This use does not eliminate usage as an accent or language identifier.
- The right-pointing parenthesis ")" is not in use at the moment
- for accent or language identifying.
- This is also the case for some digits.
-
- Special characters are encoded with some mnemonic value.
- These are not systematic thruout, but most mnemonics start
- with a special character of the reference set.
- Special chars with some sort of reference to the reference
- character set normally have this character as the first character
- in the mnemonic.
-
-
- The Variable-length Character Mnemonics
-
- The Variable-length Character Mnemonics are primarily meant for the
- ideographic characters in larger Asian character sets.
- To have the mnemonics as short as possible, which both saves storage
- and is easier to type in, a quite short name is preferred.
- Considering the Chinese standard GB 2312-1980 and the Japanese standards
- JIS X0208 and JIS X0212, they are all given by row and column
- numbers between 1 and 99. So two positions for row and column and
- a character set identifier of one character would be almost as short
- as possible. The following character set identifiers are defined:
-
- c GB 2312-1980
- j JIS X0208-1990
- J JIS X0212-1990
- k KS C 5601-1987
-
- The first idea was to have a name in Latin describing the pronunciation
- but that is not possible according to Asian sources.
-
- One prominent character in the reference character set is reserved
- for identifying variable-length mnemonics, namely the underscore "_". This character
- is intended as a delimiter both in the front and in the end
- of the mnemonic. An example of its use would be: (&=intro):
-
- &_j3210_ &_j4436_&_j6530_
-
- The Variable-Length Character Mnemonics can also be used for less-used
- Latin letters with more than one accent or other less-used special characters.
-