home *** CD-ROM | disk | FTP | other *** search
- ======================================================================
- Unicode 1.0.1 Addendum 92.11.03 8:52
-
-
- UNICODE 1.0.1
-
- The following document is an ASCII version of the Unicode 1.0.1
- addendum, which has been added to Volumes 1 and 2 of The Unicode Standard.
- Because the formatting has been lost and the original text contains non-
- ASCII characters, a dollar sign is used as a placeholder instead, and
- the text has been modified slightly for readability.
-
- Printed copies of the addendum will be sent to Unicode corporate,
- associate and individual members. Others may get a printed copy by
- sending a stamped, self-addressed envelope to the Unicode Consortium
- at the address below, or may get a fax copy on request. Copies of the
- ASCII version of this document can also be obtained by anonymous FTP
- from Unicode.Org.
-
- ________________________________________________________________________
-
- Recipient is granted the right to make copies in any form for internal
- distribution and to freely use the information supplied for the purposes of
- creating and implementing products that comply with the Unicode Standard.
-
- The authors and publishers have taken care in preparation of this work, but
- make no expressed or implied warranty of any kind and assume no responsibility
- for errors or omissions. No liability is assumed for incidental or
- consequential damages in connection with or arising out of the use of the
- information or programs contained herein.
-
- Copyright (c) 1991-1992, Unicode, Inc. All Rights reserved. Unicode (tm) is a
- registered trademark of Unicode, Inc.
-
- ________________________________________________________________________
-
- 1. Introduction
-
- As discussed in Volumes 1 and 2, small changes have been made to Unicode
- 1.0 in order to incorporate it into the international character encoding
- standard, ISO 10646, which was approved by ISO as an International
- Standard in June, 1992. The Unicode Consortium plans to issue Unicode
- 1.1 in early 1993. The character content and encoding will be identical
- to that of ISO 10646. To that end, Unicode 1.1 will include
- approximately 5,400 additional characters from ISO 10646 that are not
- already in Unicode 1.0.
-
- In order to expedite use of Unicode in the interim, the Unicode
- Consortium is issuing an intermediate version, Unicode 1.0.1, which
- consists of Unicode 1.0 modified by the changes necessary to make the
- character codes a proper subset of ISO 10646.
-
- This paper describes the differences between Unicode 1.0.1 and Unicode
- 1.0 (for more information, see Volume 1, pp. xix-xx and Volume 2, pp.
- 4-9 and 427-431). Implementations that use Unicode 1.0.1 as thus defined
- will be completely compatible with Unicode 1.1, and therefore fully
- compatible with ISO 10646.
-
- Mapping of Unicode characters to the national and industry standards
- will be finalized in Unicode 1.1 to reflect comments from reviewers and
- alignment with ISO 10646. In early 1993 a technical report will be
- issued that defines the content of Unicode 1.1, including the complete
- revised mapping tables. The mapping tables will be available in soft
- form by anonymous FTP. The technical report will be sent to members of
- the Unicode Consortium (inc. associates & individuals); others may
- obtain copies or information about FTP by contacting:
-
- The Unicode Consortium
- 1965 Charleston Road
- Mountain View, California 94043 USA
-
- E-mail: unicode-inc@hq.metaphor.com
- Phone: (415) 961-4189
- Fax: (415) 966-1637
-
-
- 2. Final Zone Allocations
-
- The following zone reallocations do not affect any allocated Unicode 1.0
- characters.
-
- A. Unicode Allocation
- Range Cells Name/Contents
- U+0000 => U+4DFF 19,968 A-ZONE Alphabets, syllabaries, symbols
- (the 65 control codes are excluded)
- U+4E00 => U+9FFF 20,992 I-ZONE Ideographs
- U+A000 => U+DFFF 16,384 O-ZONE Reserved for future assignment
- U+E000 => U+FFFF 8,192 R-ZONE Restricted use
- (FFFE & FFFF are excluded)
- B. R-ZONE Allocation
- Range Cells Name/Contents
- U+E000 => U+F8FF 6,400 Private Use Area
- (Corporate Use starts at F8FF)
- U+F900 => U+FFEF 1,776 Compatibility Zone
- (including presentation forms)
- U+FFF0 => U+FFFF 16 Specials
- (FFFE & FFFF are not character codes,
- and are excluded)
-
- 3. Characters deleted or withdrawn for further study:
-
- A. Groups of characters deleted
- Range Group Name
- U+0E70 => U+0E74 Thai Phonetic Order Vowel signs
- U+0EF0 => U+0EF4 Lao Phonetic Order Vowel signs
- U+1000 => U+104C Tibetan script
-
- B. Individual characters deleted
- U+03DB $ GREEK SMALL LETTER STIGMA
- U+03DD $ GREEK SMALL LETTER DIGAMMA
- U+03DF $ GREEK SMALL LETTER KOPPA
- U+03E1 $ GREEK SMALL LETTER SAMPI
- U+2300 $ APL COMPOSE
- U+2301 $ APL OUT
-
- 4. Characters unified
-
- From With Image Old Name
- U+0371 U+0314 $ GREEK NON-SPACING DASIA PNEUMATA
- U+0372 U+0313 $ GREEK NON-SPACING PSILI PNEUMATA
- U+0384 U+030D $ GREEK NON-SPACING TONOS
- U+04C5 U+049A $ CYRILLIC CAPITAL LETTER KA OGONEK
- U+04C6 U+049B $ CYRILLIC SMALL LETTER KA OGONEK
- U+04C9 U+04B2 $ CYRILLIC CAPITAL LETTER KHA OGONEK
- U+04CA U+04B3 $ CYRILLIC SMALL LETTER KHA OGONEK
- U+3004 U+4EDD $ IDEOGRAPHIC DITTO MARK
-
- 5. Characters moved
-
- From To Image Old Name
- U+0370 U+0345 $ GREEK NON-SPACING IOTA BELOW
- U+0385 U+0344 $ GREEK NON-SPACING DIAERESIS TONOS
- U+03D7 U+037E $ GREEK QUESTION MARK
- U+03D8 U+0374 $ GREEK UPPER NUMERAL SIGN
- U+03D9 U+0375 $ GREEK LOWER NUMERAL SIGN
- U+03F3 U+0384 $ GREEK SPACING TONOS
- U+03F4 U+0385 $ GREEK SPACING DIAERESIS TONOS
- U+03F5 U+037A $ GREEK SPACING IOTA BELOW
- U+05F5 U+FB1E $ HEBREW POINT VARIKA
- U+32FF U+3004 $ JAPANESE INDUSTRIAL STANDARD SYMBOL
-
- 6. Character blocks rearranged
-
- The explicit list will be in Unicode 1.1.
- Range Group Name
- U+32D0 => U+32FE Circled Katakana: The 1.1 characters will be
- arranged in modern order:
- e.g., A, I, U, E, O, KA, KI, ...
- U+FE80 => U+FEFC Basic glyphs for Arabic language: The 1.1
- character shapes will be arranged in different
- order: Isolate, Final, Initial, Medial
-
- 7. Character semantics changed
-
- A. Zero Width Joining
- U+200C $J ZERO WIDTH NON-JOINER
- U+200D $J ZERO WIDTH JOINER
-
- In the merger with ISO 10646, the semantics of these two characters have
- been given a narrow interpretation. This brings added precision to the
- explanation given in Volume 1, page 77.
-
- The intent of these characters is to address cursive graphical
- connection between the glyphs of a script, e.g. in scripts like Arabic
- whose printed form emulates handwriting. NON-JOINER and JOINER are best
- thought of as behaving like tiny letters that neighboring glyphs may
- connect to (JOINER) or avoid connecting to (NON-JOINER). They are thus
- processed as ordinary cursive letters rather than as control characters.
- NON-JOINER and JOINER affect how the two neighboring glyphs connect to
- them, not to each other. As such, they have no direct relationship with
- ligature formation; in particular, JOINER does not in any way request
- that its two neighbors be ligatured to each other. Indeed, both NON-
- JOINER and JOINER may break up ligatures by interrupting the character
- sequence required to form the ligature.
-
- The precise relationship between cursive appearance and ligatured
- appearance may differ from script to script, and therefore the precise
- usage of these characters is script-dependent. In the case of Latin
- typography, cursiveness (handwriting emulation) and ligaturing are
- independent. Thus the text on Volume 1, page 77, may be clarified as
- follows:
-
- f + JOINER + i will not form the ligature fi. Instead, if cursive
- versions of the f and i are available in the font, each will
- independently connect to the JOINER on the appropriate side (having the
- same appearance as f + i).
-
- Usage of optional ligatures such as => is not controlled by any codes
- within the Unicode standard, but is determined by protocols or resources
- external to the text sequence.
-
- As further illustration, let a hyphen stand for a cursive connection to
- a preceeding or following letter. Then in a cursive Latin font we would
- get the following results (with N standing for NON-JOINER and J for
- JOINER).
-
- Unicodes Rendering
- f i s h f- -i- -s- -h (optionally using a fi- ligature)
- f J i s h f- -i- -s- -h
- f N i s h f i- -s- -h
- f J N i s h f- i- -s- -h
- f N J i s h f -i- -s- -h
-
- With regard to the Arabic script, the statements in Volume 1, page 77,
- remain correct. In Volume 2, page 390, Arabic rules L2 and L3, the
- JOINER can be used to get the appearance in parentheses.
-
- With regard to conjuncts in Indic scripts, the statements in Volume 1,
- pp. 53-56, and Volume 2, pp. 399-414, remain correct. However for
- clarity, in pp. 399-414 the term ligature should be replaced by the term
- conjunct.
-
- B. Byte Order Mark
- U+FEFF $J ZERO WIDTH NO-BREAK SPACE
-
- In addition to the meaning of BYTE ORDER MARK, as defined in Volume 1 of
- the Unicode standard, the code value U+FEFF may now also be used as ZERO
- WIDTH NO-BREAK SPACE (ZWNBSP). For convenience in discussion, it can
- also be referred to by this name (which is the ISO 10646/Unicode 1.1
- name for U+FEFF).
-
- ZWNBSP behaves like a U+00A0 NO-BREAK SPACE in that it indicates the
- absence of word boundaries; however, ZWNBSP has no width. For example,
- this character can be inserted after the fourth character in the text
- "base+delta" to indicate that there should be no line break between the
- "e" and the "+" (for more information, see Volume 2, pp. 6-7).
-
- 8. Characters added
-
- There are a large number of characters that will be added to Unicode 1.1
- that will be included in the technical report, as explained above. These
- will include the following characters, which were omitted from Unicode
- 1.0.
-
- U+0A4D $ GURMUKHI SIGN VIRAMA
- U+0A8D $ GUJARATI VOWEL CANDRA E
- U+0A91 $ GUJARATI VOWEL CANDRA O
- U+0AC9 $ GUJARATI VOWEL SIGN CANDRA O
- U+0B56 $ ORIYA AI LENGTH MARK
- U+25EF $ LARGE CIRCLE
- U+FFE8 $ HALFWIDTH FORMS LIGHT VERTICAL
- U+FFE9 $ HALFWIDTH LEFTWARDS ARROW
- U+FFEA $ HALFWIDTH UPWARDS ARROW
- U+FFEB $ HALFWIDTH RIGHTWARDS ARROW
- U+FFEC $ HALFWIDTH DOWNWARDS ARROW
- U+FFED $ HALFWIDTH BLACK SQUARE
- U+FFEE $ HALFWIDTH WHITE CIRCLE
-
- 9. Character mapping changed
-
- From To Image XJIS Name
- U+00AD U+2010 $ 815D JIS HYPHEN
- U+20DD U+25EF $ 81FC JIS COMPOSITION CIRCLE
-
-
-
-
- Volume 2 Errata
-
- 1. Page 6
- Change in lines 26, 27: ... ZERO WIDTH SPACE can be used to indicate
- word boundaries in scripts like Thai...
-
- 2. Page 19
- The glyphs in Figures 2-14 and 2-15 were printed incorrectly. The 4
- correct glyphs are:
- Figure Image on Left Image on Right
- 2-14 $ $
- 2-15 $ $
-
- 3. Pages 60,66,75,79,91,131,135,140,143,150,264,277,301,311,343
- There are are number of glyphs which were printed incorrectly in various
- places in Volume 2. The most serious are:
- Code Image Pages
- U+71F7 $ 60, 131, 264
- U+773E $ 66, 135, 277
- U+809C $ 75, 140, 301
- U+8480 $ 79, 143, 311
- U+908E $ 91, 150, 343
-
- 4. Page 401
- Change wording and rule in C3: ...The dead consonant RAd changes to a
- non-spacing mark RAx when followed by a consonant cluster. The...
- RAn + VIRAMAn => RAx
-
- 5. Page 403
- Add L1a: The ZERO-WIDTH JOINER can be used to produce the so-called
- eyelash-RA (RAh) used in Marathi. RAh is a spacing half-consonant which
- is not subject to special ordering of RAx (O2).
- RAn + ZWJ + VIRAMAn => RAx
-
- 6. Page 404
- Change O2 to:
- RAx + Cluster => Cluster + RAx
- In processing a line of glyphs, this rule is not applied twice to the
- same RAx.
-
- 7. Page 429
- Line 7 has the period misplaced, and should read:
- Visual: .KO ,bmw 500 A SI TI