home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1997 December
/
Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso
/
drafts
/
draft_n_r
/
draft-rfced-info-tamaru-00.txt
< prev
next >
Wrap
Text File
|
1997-08-26
|
10KB
|
282 lines
INTERNET DRAFT EXPIRES FEB 1998 INTERNET DRAFT
Network Working Group MicrosoftCorporation
Internet Draft K.Tamaru
Japanese Character Encoding for Internet Messages
<draft-rfced-info-tamaru-00.txt>
Status of This Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
``work in progress.''
To learn the current status of any Internet-Draft, please check
the ``1id-abstracts.txt'' listing contained in the Internet-
Drafts Shadow Directories on ftp.is.co.za (Africa),
nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).
Distribution of this document is unlimited.
1. Abstract
This memo defines an encoding scheme for the Japanese Characters,
describes 'ISO-2022-JP', which is used in electronic mail[RFC 822],
and network news [RFC 1036]. Also this memo provides a listing of
the Japanese Character Set that can be used in this encoding scheme.
2. Introduction
RFC 1468 defines the way Japanese Characters are encoded, likewise
what this memo defines. It defines the use of JIS X 0208 as the
double-byte character set in ISO-2022-JP text.
Today, many operating systems support proprietary extended Japanese
characters or JIS X 0212, This includes the Unicode character set,
which does not conform to JIS X 0201 nor JIS X 0208. Therefore,
this limits the ability to communicate and correspond precise
information because of the limited availability of Kanji characters.
Fortunately JIS(Japanese Industry Standard) defines JIS X 0212 as
"code of the supplementary Japanese graphic character set for
information interchange". Most Japanese characters which are used
regular electronic mail in most cases can be accommodated in
JIS X 0201, JIS X 0208 and JIS X 0212.
Also it is recognized that there is a tendency to use Unicode,
however, Unicode is not yet widely used and there is a certain
limitation with old electronic mail system. Furthermore, the
purpose of this comment is to add the capability of writing out
JIS X 0212.
This comment does not describe any representation of iso-2022-jp
version information in addition to JIS X 0212 support.
3. Description
In "ISO-2022-JP" text, the initial character code of the message is
in ASCII. The "double-byte-seq"(see "Format Syntax" section)
(ESC "$" "B" / ESC "$" "@" / ESC "$" "(" "D") is the only designator
that indicates that the following character is double-byte, and it
is valid until another escape sequence appears.
Tamaru [Page 1]
Internet Draft Japanese Character Encoding
The end of "ISO-2022-JP" text must also be in ASCII. Also it is
strongly recommended to back up to the ASCII at the end of each
line rather than JIS X 0201-Roman if there is any none ASCII
character in middle of a line. JIS X 0201-Roman is not identical
to the ASCII with two different characters.
The following list are the escape sequences and character sets that
can be used in "ISO-2022-JP" text. The registered number in the
ISO 2375 Register which allow double-byte ideographic scripts to be
encoded within ISO/IEC 2022 code structure is indicated as reg#
below.
reg# character set ESC sequence designated to
6 ASCII ESC 2/8 4/2 ESC ( B G0
42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G0
87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G0
14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G0
159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G0
Other restrictions are given in the Formal Syntax below.
4. Formal Syntax
The notational conventions used here are identical to those used in
STD 11, RFC 822 [RFC822].
The * (asterisk) convention is as follows:
l*m something
meaning at least l and at most m something, with l and m taking
default values of 0 and infinity, respectively.
message =3D headers 1*(CRLF text)
; see also [MIME1] "body-part"
; note: must end in ASCII
text =3D *(single-byte-char *segment
single-byte-seq *single-byte-char )
headers =3D <see [RFC822] "fields" and [MIME1]
"body-part">
segment =3D single-byte-segment / double-byte-segment
Tamaru [Page 2]
Internet Draft Japanese Character Encoding
single-byte-segment =3D single-byte-seq *single-byte-char
double-byte-segment =3D double-byte-seq *(one-of-94 one-of-94)
reset-seq =3D ESC "(" ( "B" / "J" )
single-byte-seq =3D ESC "(" ( "B" / "J" )
double-byte-seq =3D (ESC "$" ( "@" / "B" )) /
(ESC "$" "(" "D" )
CRLF =3D CR LF
;( Octal, Decimal.)
ESC =3D <ISO 2022 ESC, escape> ;( 33, 27. )
SI =3D <ISO 2022 SI, shift-in> ;( 17, 15. )
SO =3D <ISO 2022 SO, shift-out> ;( 16, 14. )
CR =3D <ASCII CR, carriage return>;( 15, 13. )
LF =3D <ASCII LF, linefeed> ;( 12, 10. )
one-of-94 =3D <any one of 94 values> ;(41-176, 33.-126.)
one-of-96 =3D <any one of 96 values> ;(40-177, 32.-127.)
7BIT =3D <any 7-bit value> ;( 0-177, 0.-127. )
single-byte-char =3D <any 7BIT, including bare CR & bare LF,
but NOT including CRLF, and not including ESC, SI, SO>
5. Security Considerations
This draft does not address security issues.
6. MIME Considerations
The name to be used for the Japanese encoding scheme in content is
"ISO-2022-JP". When this name is used in the MIME message form, it
would be:
Content-Type: text/plain; charset=3Diso-2022-jp
Tamaru [Page 3]
Internet Draft Japanese Character Encoding
Since the "ISO-2022-JP" is 7bit encoding, it will be unnecessary to
encode in another format by specifying the
"Content-Trasnfer-Encoding" header. Also applying Based64 or
Quoted-Printable encoding may cause today's software to fail to
decode the message.
"ISO-2022-JP" can be used in MIME headers. Also "ISO-2022-JP" text
can be used with Base64 or Quoted-Printable encoding.
7. Additional Information
As long as mail systems are capable of writing out Unicode, it is
recommended to also write out Unicode text in addition to
"ISO-2022-JP" text.
Some mail systems write out 8bits characters in 'parameter' and
'value' defined in [RFC 822] and [RFC 1521]. All 8bit
characters must not be used in those fields. The implementation
of future mail systems should support those only for
interoperability reasons.
8. References
[ISO2022]
International Organization for Standardization (ISO),
"Information processing -- ISO 7-bit and 8-bit coded
character sets -- Code extension techniques",
International Standard, Ref. No. ISO 2022-1986 (E).
[ISOREG] International Organization for Standardization (ISO),
"International Register of Coded Character Sets To Be Used
With Escape Sequences".
[RFC-822]
Crocker, D., "Standard for the Format of ARPA Internet
Text Messages", RFC> 822 August, 1982.
[2022JP]
Murai, J., Crispin, M., and E. van der Poel, "Japanese
Character Encoding for Internet Messages", RFC 1468, June
1993.
[RFC-1766]
Alvestrand, H., "Tags for the Identification of
Languages", RFC 1766, March, 1995.
Tamaru [Page =4]
Internet Draft Japanese Character Encoding
[RFC-2045]
Freed, N. and Borenstein, N., "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message
Bodies", RFC 2045, Innosoft, First Virtual Holdings,
December 1996.
[RFC-2046]
Freed, N. and Borenstein, N., "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046,
Innosoft, First Virtual Holdings, December 1996.
[RFC-2047]
Moore, K., "Multipurpose Internet Mail Extensions (MIME)
Part Three: Representation of Non-ASCII Text in Internet
Message Headers", RFC 2047, University of Tennessee,
December 1996.
[RFC-2048]
Freed, N., Klensin, J., Postel, J., "Multipurpose
Internet Mail Extensions (MIME) Part Four: MIME
Registration Procedures", RFC 2048, Innosoft, MCI, ISI,
December 1996.
[RFC-2049]
Freed, N. and Borenstein, N., "Multipurpose Internet Mail
Extensions (MIME) Part Five: Conformance Criteria and
Examples", RFC 2049, Innosoft, FIrst Virtual Holdings,
December 1996.
9. Author's Address
Kenzaburo Tamaru
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052-6399
E-Mail: kenzat@microsoft.com
INTERNET DRAFT EXPIRES FEB 1998 INTERNET DRAFT