home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
- Network Working Group N. Freed
- Request for Comments: 2278 Innosoft
- BCP: 19 J. Postel
- Category: Best Current Practice ISI
- January 1998
-
- IANA Charset
- Registration Procedures
-
- Status of this Memo
-
- This document specifies an Internet Best Current Practices for the
- Internet Community, and requests discussion and suggestions for
- improvements. Distribution of this memo is unlimited.
-
- Copyright Notice
-
- Copyright (C) The Internet Society (1998). All Rights Reserved.
-
- 1. Abstract
-
- MIME [RFC-2045, RFC-2046, RFC-2047, RFC-2184] and various other
- modern Internet protocols are capable of using many different
- charsets. This in turn means that the ability to label different
- charsets is essential. This registration procedure exists solely to
- associate a specific name or names with a given charset and to give
- an indication of whether or not a given charset can be used in MIME
- text objects. In particular, the general applicability and
- appropriateness of a given registered charset is a protocol issue,
- not a registration issue, and is not dealt with by this registration
- procedure.
-
- 2. Definitions and Notation
-
- The following sections define various terms used in this document.
-
- 2.1. Requirements Notation
-
- This document occasionally uses terms that appear in capital letters.
- When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
- appear capitalized, they are being used to indicate particular
- requirements of this specification. A discussion of the meanings of
- these terms appears in [RFC-2119].
-
-
-
-
-
-
-
-
- Freed & Postel Best Current Practice [Page 1]
-
- RFC 2278 Charset Registration January 1998
-
-
- 2.2. Character
-
- A member of a set of elements used for the organisation, control, or
- representation of data.
-
- 2.3. Charset
-
- The term "charset" (see historical note below) is used here to refer
- to a method of converting a sequence of octets into a sequence of
- characters. This conversion may also optionally produce additional
- control information such as directionality indicators.
-
- Note that unconditional and unambiguous conversion in the other
- direction is not required, in that not all characters may be
- representable by a given charset and a charset may provide more than
- one sequence of octets to represent a particular sequence of
- characters.
-
- This definition is intended to allow charsets to be defined in a
- variety of different ways, from simple single-table mappings such as
- US-ASCII to complex table switching methods such as those that use
- ISO 2022's techniques, to be used as charsets. However, the
- definition associated with a charset name must fully specify the
- mapping to be performed. In particular, use of external profiling
- information to determine the exact mapping is not permitted.
-
- HISTORICAL NOTE: The term "character set" was originally used in MIME
- to describe such straightforward schemes as US-ASCII and ISO-8859-1
- which consist of a small set of characters and a simple one-to-one
- mapping from single octets to single characters. Multi-octet
- character encoding schemes and switching techniques make the
- situation much more complex. As such, the definition of this term was
- revised to emphasize both the conversion aspect of the process, and
- the term itself has been changed to "charset" to emphasize that it is
- not, after all, just a set of characters. A discussion of these
- issues as well as specification of standard terminology for use in
- the IETF appears in RFC 2130.
-
- 2.4. Coded Character Set
-
- A Coded Character Set (CCS) is a mapping from a set of abstract
- characters to a set of integers. Examples of coded character sets are
- ISO 10646 [ISO-10646], US-ASCII [US-ASCII], and the ISO-8859 series
- [ISO-8859].
-
-
-
-
-
-
-
- Freed & Postel Best Current Practice [Page 2]
-
- RFC 2278 Charset Registration January 1998
-
-
- 2.5. Character Encoding Scheme
-
- A Character Encoding Scheme (CES) is a mapping from a Coded Character
- Set or several coded character sets to a set of octets. A given CES
- is typically associated with a single CCS; for example, UTF-8 applies
- only to ISO 10646.
-
- 3. Registration Requirements
-
- Registered charsets are expected to conform to a number of
- requirements as described below.
-
- 3.1. Required Characteristics
-
- Registered charsets MUST conform to the definition of a "charset"
- given above. In addition, charsets intended for use in MIME content
- types under the "text" top-level type must conform to the
- restrictions on that type described in RFC 2045. All registered
- charsets MUST note whether or not they are suitable for use in MIME.
-
- All charsets which are constructed as a composition of a CCS and a
- CES MUST either include the CCS and CES they are based on in their
- registration or else cite a definition of their CCS and CES that
- appears elsewhere.
-
- All registered charsets MUST be specified in a stable, openly
- available specification. Registration of charsets whose
- specifications aren't stable and openly available is forbidden.
-
- 3.2. New Charsets
-
- This registration mechanism is not intended to be a vehicle for the
- definition of entirely new charsets. This is due to the fact that the
- registration process does NOT contain adequate review mechanisims for
- such undertakings.
-
- As such, only charsets defined by other processes and standards
- bodies, or specific profiles of such charsets, are eligible for
- registration.
-
- 3.3. Naming Requirements
-
- One or more names MUST be assigned to all registered charsets.
- Multiple names for the same charset are permitted, but if multiple
- names are assigned a single primary name for the charset MUST be
- identified. All other names are considered to be aliases for the
- primary name and use of the primary name is preferred over use of any
- of the aliases.
-
-
-
- Freed & Postel Best Current Practice [Page 3]
-
- RFC 2278 Charset Registration January 1998
-
-
- Each assigned name MUST uniquely identify a single charset. All
- charset names MUST be suitable for use as the value of a MIME content
- type charset parameter and hence MUST conform to MIME parameter value
- syntax. This applies even if the specific charset being registered is
- not suitable for use with the "text" media type.
-
- Finally, charsets being registered for use with the "text" media type
- MUST have a primary name that conforms to the more restrictive syntax
- of the charset field in MIME encoded-words [RFC-2047, RFC-2184] and
- MIME extended parameter values [RFC-2184]. A combined ABNF definition
- for such names is as follows:
-
- mime-charset = 1*<Any CHAR except SPACE, CTLs, and cspecials>
-
- cspecials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
- <"> / "/" / "[" / "]" / "?" / "." / "=" / "*"
-
- CHAR = <any ASCII character> ; ( 0-177, 0.-127.)
- SPACE = <ASCII SP, space> ; ( 40, 32.)
- CTL = <any ASCII control ; ( 0- 37, 0.- 31.)
- character and DEL> ; ( 177, 127.)
-
- 3.4. Functionality Requirement
-
- Charsets must function as actual charsets: Registration of things
- that are better thought of as a transfer encoding, as a media type,
- or as a collection of separate entities of another type, is not
- allowed. For example, although HTML could theoretically be thought
- of as a charset, it is really better thought of as a media type and
- as such it cannot be registered as a charset.
-
- 3.5. Usage and Implementation Requirements
-
- Use of a large number of charsets in a given protocol may hamper
- interoperability. However, the use of a large number of undocumented
- and/or unlabelled charsets hampers interoperability even more.
-
- A charset should therefore be registered ONLY if it adds significant
- functionality that is valuable to a large community, OR if it
- documents existing practice in a large community. Note that charsets
- registered for the second reason should be explicitly marked as being
- of limited or specialized use and should only be used in Internet
- messages with prior bilateral agreement.
-
- 3.6. Publication Requirements
-
- Charset registrations can be published in RFCs, however, RFC
- publication is not required to register a new charset.
-
-
-
- Freed & Postel Best Current Practice [Page 4]
-
- RFC 2278 Charset Registration January 1998
-
-
- The registration of a charset does not imply endorsement, approval,
- or recommendation by the IANA, IESG, or IETF, or even certification
- that the specification is adequate. It is expected that applicability
- statements for particular applications will be published from time to
- time that recommend implementation of, and support for, charsets that
- have proven particularly useful in those contexts.
-
- 3.7. MIBenum Requirements
-
- Each registered charset MUST also be assigned a unique enumerated
- integer value. These "MIBenum" values are defined by and used in the
- Printer MIB [RFC-1759].
-
- A MIBenum value for each charset will be assigned by IANA at the time
- of registration.
-
- 4. Registration Procedure
-
- The following procedure has been implemented by the IANA for review
- and approval of new charsets. This is not a formal standards
- process, but rather an administrative procedure intended to allow
- community comment and sanity checking without excessive time delay.
-
- 4.1. Present the Charset to the Community
-
- Send the proposed charset registration to the "ietf-
- charsets@iana.org" mailing list. This mailing list has been
- established for the sole purpose of reviewing proposed charset
- registrations. Proposed charsets are not formally registered and must
- not be used; the "x-" prefix specified in RFC 2045 can be used until
- registration is complete.
-
- The intent of the public posting is to solicit comments and feedback
- on the definition of the charset and the name chosen for it over a
- two week period.
-
- 4.2. Charset Reviewer
-
- When the two week period has passed and the registration proposer is
- convinced that consensus has been achieved, the registration
- application should be submitted to IANA and the charset reviewer. The
- charset reviewer, who is appointed by the IETF Applications Area
- Director(s), either approves the request for registration or rejects
- it. Rejection may occur because of significant objections raised on
- the list or objections raised externally. If the charset reviewer
- considers the registration sufficiently important and controversial,
- a last call for comments may be issued to the full IETF. The charset
-
-
-
-
- Freed & Postel Best Current Practice [Page 5]
-
- RFC 2278 Charset Registration January 1998
-
-
- reviewer may also recommend standards track processing (before or
- after registration) when that appears appropriate and the level of
- specification of the charset is adequate.
-
- Decisions made by the reviewer must be posted to the ietf-charsets
- mailing list within 14 days. Decisions made by the reviewer may be
- appealed to the IESG.
-
- 4.3. IANA Registration
-
- Provided that the charset registration has either passed review or
- has been successfully appealed to the IESG, the IANA will register
- the charset, assign a MIBenum value, and make its registration
- available to the community.
-
- 5. Location of Registered Charset List
-
- Charset registrations will be posted in the anonymous FTP file
- "ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets" and all
- registered charsets will be listed in the periodically issued
- "Assigned Numbers" RFC [currently RFC-1700]. The description of the
- charset may also be published as an Informational RFC by sending it
- to "rfc-editor@isi.edu" (please follow the instructions to RFC
- authors [RFC-2223]).
-
- 6. Registration Template
-
- To: ietf-charsets@iana.org
- Subject: Registration of new charset
-
- Charset name(s):
-
- (All names must be suitable for use as the value of a MIME content-
- type parameter.)
-
- Published specification(s):
-
- (A specification for the charset must be openly available that
- accurately describes what is being registered. If a charset is
- defined as a composition of a CCS and a CES then these defintions
- must either be included or referenced.)
-
- Person & email address to contact for further information:
-
-
-
-
-
-
-
-
- Freed & Postel Best Current Practice [Page 6]
-
- RFC 2278 Charset Registration January 1998
-
-
- 7. Security Considerations
-
- This registration procedure is not known to raise any sort of
- security considerations that are appreciably different from those
- already existing in the protocols that employ registered charsets.
-
- 8. References
-
- [ISO-2022]
- International Standard -- Information Processing -- Character
- Code Structure and Extension Techniques, ISO/IEC 2022:1994, 4th
- ed.
-
- [ISO-8859]
- International Standard -- Information Processing -- 8-bit
- Single-Byte Coded Graphic Character Sets
- - Part 1: Latin Alphabet No. 1, ISO 8859-1:1987, 1st ed.
- - Part 2: Latin Alphabet No. 2, ISO 8859-2:1987, 1st ed.
- - Part 3: Latin Alphabet No. 3, ISO 8859-3:1988, 1st ed.
- - Part 4: Latin Alphabet No. 4, ISO 8859-4:1988, 1st ed.
- - Part 5: Latin/Cyrillic Alphabet, ISO 8859-5:1988, 1st
- ed.
- - Part 6: Latin/Arabic Alphabet, ISO 8859-6:1987, 1st ed.
- - Part 7: Latin/Greek Alphabet, ISO 8859-7:1987, 1st ed.
- - Part 8: Latin/Hebrew Alphabet, ISO 8859-8:1988, 1st ed.
- - Part 9: Latin Alphabet No. 5, ISO/IEC 8859-9:1989, 1st
- ed.
- International Standard -- Information Technology -- 8-bit
- Single-Byte Coded Graphic Character Sets
- - Part 10: Latin Alphabet No. 6, ISO/IEC 8859-10:1992,
- 1st ed.
-
- [ISO-10646]
- ISO/IEC 10646-1:1993(E), "Information technology --
- Universal Multiple-Octet Coded Character Set (UCS) --
- Part 1: Architecture and Basic Multilingual Plane",
- JTC1/SC2, 1993.
-
- [RFC-2048]
- Freed, N., Klensin, J., and J. Postel, "Multipurpose Internet
- Mail Extensions (MIME) Part Four: Registration Procedures", RFC
- 2048, November 1996.
-
- [RFC-1700]
- Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
- 1700, October 1994.
-
-
-
-
-
- Freed & Postel Best Current Practice [Page 7]
-
- RFC 2278 Charset Registration January 1998
-
-
- [RFC-1759]
- Smith, R., Wright, F., Hastings, T., Zilles, S., and J.
- Gyllenskog, "Printer MIB", RFC 1759, March 1995.
-
- [RFC-2045]
- Freed, N., and N. Borenstein, "Multipurpose Internet Mail
- Extensions (MIME) Part One: Format of Internet Message Bodies",
- RFC 2045, November 1996.
-
- [RFC-2046]
- Freed, N., and N. Borenstein, "Multipurpose Internet Mail
- Extensions (MIME) Part Two: Media Types", RFC 2046, November
- 1996.
-
- [RFC-2047]
- Moore, K., "Multipurpose Internet Mail Extensions (MIME) Part
- Three: Representation of Non-Ascii Text in Internet Message
- Headers", RFC 2047, November 1996.
-
- [RFC-2119]
- Bradner, S., "Key words for use in RFCs to Indicate Requirement
- Levels", BCP 14, RFC 2119, March 1997.
-
- [RFC-2130]
- Weider, C., Preston, C., Simonsen, K., Alvestrand, H., Atkinson,
- R., Crispin, M., and P. Svanberg, "Report from the IAB Character
- Set Workshop", RFC 2130, April 1997.
-
- [RFC-2184]
- Freed, N., and K. Moore, "MIME Parameter Value and Encoded Word
- Extensions: Character Sets, Languages, and Continuations", RFC
- 2184, August 1997.
-
- [US-ASCII]
- Coded Character Set -- 7-Bit American Standard Code for
- Information Interchange, ANSI X3.4-1986.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Freed & Postel Best Current Practice [Page 8]
-
- RFC 2278 Charset Registration January 1998
-
-
- 9. Authors' Addresses
-
- Ned Freed
- Innosoft International, Inc.
- 1050 Lakes Drive
- West Covina, CA 91790
- USA
-
- Phone: +1 626 919 3600
- Fax: +1 626 919 3614
- EMail: ned.freed@innosoft.com
-
-
- Jon Postel
- USC/Information Sciences Institute
- 4676 Admiralty Way
- Marina del Rey, CA 90292
- USA
-
- Phone: +1 310 822 1511
- Fax: +1 310 823 6714
- EMail: Postel@ISI.EDU
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Freed & Postel Best Current Practice [Page 9]
-
- RFC 2278 Charset Registration January 1998
-
-
- Full Copyright Statement
-
- Copyright (C) The Internet Society (1998). All Rights Reserved.
-
- This document and translations of it may be copied and furnished to
- others, and derivative works that comment on or otherwise explain it
- or assist in its implementation may be prepared, copied, published
- and distributed, in whole or in part, without restriction of any
- kind, provided that the above copyright notice and this paragraph are
- included on all such copies and derivative works. However, this
- document itself may not be modified in any way, such as by removing
- the copyright notice or references to the Internet Society or other
- Internet organizations, except as needed for the purpose of
- developing Internet standards in which case the procedures for
- copyrights defined in the Internet Standards process must be
- followed, or as required to translate it into languages other than
- English.
-
- The limited permissions granted above are perpetual and will not be
- revoked by the Internet Society or its successors or assigns.
-
- This document and the information contained herein is provided on an
- "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
- TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
- BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
- HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
- MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Freed & Postel Best Current Practice [Page 10]
-
-