Previous Up Index Next

Character Sets


The following information is described in this section:

Charset Recognition in IE3

This charset recognition specification defines which charset identifiers Internet Explorer recognizes in the HTTP header of HTTP replies, and which charset IDs it recognizes in the <META ... CHARSET=charsetID> tag. It also specifies which built-in charset translation the charset ID maps to. This does not specify what IE should send out as the ACCEPT-CHARSET parameter in the HTTP request.

Table of Base Charsets, Display Names, and Aliases

In the following table, the base charset is the basic translation built into IE3. Aliases lists all other charset IDs that are recognized and can be represented without translation, using the "base charset" translation method. This does not, in all cases, mean that alias and base charset represent the same charset; the alias charset can be a subset of the base charset. Base charset is not a recognized name unless repeated in the "aliases" column.

Base Character Display Name Aliases
1252 Western us-ascii, iso8859-1, ascii, iso_8859-1, iso-8859-1, ANSI_X3.4-1968, iso-ir-6, ANSI_X3.4-1986, ISO_646.irv:1991, ISO646-US, us, IBM367, cp367, csASCII, latin1, iso_8859-1:1987, iso-ir-100, ibm819, cp819
28592 Central European (ISO) iso8859-2, iso-8859-2, iso_8859-2, latin2, iso_8859-2:1987, iso-ir-101, l2, csISOLatin2
1250 Central European (Windows) windows-1250, x-cp1250
1251 Cyrillic (Windows) windows-1251, x-cp1251
1253 Greek (Windows) windows-1253
1254 Turkish (Windows) windows-1254
932 Shift-JIS shift_jis, x-sjis, ms_Kanji, csShiftJIS
EUC-JP EUC Extended_UNIX_Code_Packed_Format_for_Japanese, csEUCPkdFmtJapanese, x-euc-jp
JIS JIS csISO2022JP, iso-2022-jp
1257 windows-1257
950 Traditional Chinese (BIG5) big5, csbig5, x-x-big5
936 Simplified Chinese GB_2312-80, iso-ir-58, chinese, csISO58GB231280, csGB2312, gb2312
20866 Cyrillic (KOI8-R) csKOI8R, koi8-r
949 Korean ks_c_5601, ks_c_5601-1987, korean, csKSC56011987

Correct Usage

The correct usage is as specified in RFC 1341. For example:

<META HTTP-EQUIV="Content-Type" 
  CONTENT="text/html; charset=Windows-1251"> 

This should be in or before HEAD but certainly before BODY.

Priority

The following list shows the priorities of charset declarations that IE will use.

  1. Use any charset parameter passed in the HTTP content-type.
  2. Use the <META ... CHARSET...> tag.
  3. Use the user preference for default document encoding.

A frameset can have differing charsets per frame.

Position of <META .. CHARSET=..> in the Document

The <META .. CHARSET=..> sequence can appear anywhere in the document BEFORE the BODY tag. In any case, it affects the whole document, including TITLEs, appearing before the <META CHARSET> tag.

ISO Latin-1 Character Set

The following table contains the ISO Latin-1 character set. The table describes each character, its decimal code, and its special entity reference for HTML, as well as providing a brief description.
Character Decimal Code HTML Description
À &#192; &Agrave; Capital A, grave accent
à &#224; &agrave; Small a, grave accent
Á &#193; &Aacute; Capital A, acute accent
á &#225; &aacute; Small a, acute accent
 &#194; &Acirc; Capital A, circumflex
â &#226; &acirc; Small a, circumflex
à &#195; &Atilde; Capital A, tilde
ã &#227; &atilde; Small a, tilde
Ä &#196; &Auml; Capital A, diæresis / umlaut
ä &#228; &auml; Small a, diæresis / umlaut
Å &#197; &Aring; Capital A, ring
å &#229; &aring; Small a, ring
Æ &#198; &AElig; Capital AE ligature
æ &#230; &aelig; Small ae ligature
Ç &#199; &Ccedil; Capital C, cedilla
ç &#231; &ccedil; Small c, cedilla
È &#200; &Egrave; Capital E, grave accent
è &#232; &egrave; Small e, grave accent
É &#201; &Eacute; Capital E, acute accent
é &#233; &eacute; Small e, acute accent
Ê &#202; &Ecirc; Capital E, circumflex
ê &#234; &ecirc; Small e, circumflex
Ë &#203; &Euml; Capital E, diæresis / umlaut
ë &#235; &euml; Small e, diæresis / umlaut
Ì &#204; &Igrave; Capital I, grave accent
ì &#236; &igrave; Small i, grave accent
Í &#205; &Iacute; Capital I, acute accent
í &#237; &iacute; Small i, acute accent
Î &#206; &Icirc; Capital I, circumflex
î &#238; &icirc; Small i, circumflex
Ï &#207; &Iuml; Capital I, diæresis / umlaut
ï &#239; &iuml; Small i, diæresis / umlaut
Ð &#208; &ETH; Capital Eth, Icelandic
ð &#240; &eth; Small eth, Icelandic
Ñ &#209; &Ntilde; Capital N, tilde
ñ &#241; &ntilde; Small n, tilde
Ò &#210; &Ograve; Capital O, grave accent
ò &#242; &ograve; Small o, grave accent
Ó &#211; &Oacute; Capital O, acute accent
ó &#243; &oacute; Small o, acute accent
Ô &#212; &Ocirc; Capital O, circumflex
ô &#244; &ocirc; Small o, circumflex
Õ &#213; &Otilde; Capital O, tilde
õ &#245; &otilde; Small o, tilde
Ö &#214; &Ouml; Capital O, diæresis / umlaut
ö &#246; &ouml; Small o, diæresis / umlaut
Ø &#216; &Oslash; Capital O, slash
ø &#248; &oslash; Small o, slash
Ù &#217; &Ugrave; Capital U, grave accent
ù &#249; &ugrave; Small u, grave accent
Ú &#218; &Uacute; Capital U, acute accent
ú &#250; &uacute; Small u, acute accent
Û &#219; &Ucirc; Capital U, circumflex
û &#251; &ucirc; Small u, circumflex
Ü &#220; &Uuml; Capital U, diæresis / umlaut
ü &#252; &uuml; Small u, diæresis / umlaut
Ý &#221; &Yacute; Capital Y, acute accent
ý &#253; &yacute; Small y, acute accent
Þ &#222; &THORN; Capital Thorn, Icelandic
þ &#254; &thorn; Small thorn, Icelandic
ß &#223; &szlig; Small sharp s, German sz
ÿ &#255; &yuml; Small y, diæresis / umlaut

Character Set

The following table describes the complete character set for Internet Explorer 3.0 English (U.S.). The first column shows the character as it appears in Internet Explorer 3.0. The second column shows the decimal number as it is written in an HTML document to produce the characters. Occasionally, special characters have mnemonic names. For example, the registered trademark character can be written in HTML as &reg;. The third column lists these HTML characters. The last column gives a description of each character where appropriate.
Character Decimal Code HTML Description
&#00; Unused
&#01; Unused
&#02; Unused
&#03; Unused
&#04; Unused
&#05; Unused
&#06; Unused
&#07; Unused
&#08; Unused
&#09; Horizontal tab
&#10; Line feed
&#11; Unused
&#12; Unused
&#13; Carriage Return
&#14; Unused
&#15; Unused
&#16; Unused
&#17; Unused
&#18; Unused
&#19; Unused
&#20; Unused
&#21; Unused
&#22; Unused
&#23; Unused
&#24; Unused
&#25; Unused
&#26; Unused
&#27; Unused
&#28; Unused
&#29; Unused
&#30; Unused
&#31; Unused
&#32; Space
! &#33; Exclamation mark
" &#34; &quot; Quotation mark
# &#35; Number sign
$ &#36; Dollar sign
% &#37; Percent sign
& &#38; &amp; Ampersand
' &#39; Apostrophe
( &#40; Left parenthesis
) &#41; Right parenthesis
* &#42; Asterisk
+ &#43; Plus sign
, &#44; Comma
- &#45; Hyphen
. &#46; Period (fullstop)
/ &#47; Solidus (slash)
0 &#48; Digit 0
1 &#49; Digit 1
2 &#50; Digit 2
3 &#51; Digit 3
4 &#52; Digit 4
5 &#53; Digit 5
6 &#54; Digit 6
7 &#55; Digit 7
8 &#56; Digit 8
9 &#57; Digit 9
: &#58; Colon
; &#59; Semicolon
< &#60; &lt; Less than
= &#61; Equals sign
> &#62; &gt; Greater than
? &#63; Question mark
@ &#64; Commercial at
A &#65; Capital A
B &#66; Capital B
C &#67; Capital C
D &#68; Capital D
E &#69; Capital E
F &#70; Capital F
G &#71; Capital G
H &#72; Capital H
I &#73; Capital I
J &#74; Capital J
K &#75; Capital K
L &#76; Capital L
M &#77; Capital M
N &#78; Capital N
O &#79; Capital O
P &#80; Capital P
Q &#81; Capital Q
R &#82; Capital R
S &#83; Capital S
T &#84; Capital T
U &#85; Capital U
V &#86; Capital V
W &#87; Capital W
X &#88; Capital X
Y &#89; Capital Y
Z &#90; Capital Z
[ &#91; Left square bracket
\ &#92; Reverse solidus (backslash)
] &#93; Right square bracket
^ &#94; Caret
_ &#95; Horizontal bar (underscore)
` &#96; Acute accent
a &#97; Small a
b &#98; Small b
c &#99; Small c
d &#100; Small d
e &#101; Small e
f &#102; Small f
g &#103; Small g
h &#104; Small h
i &#105; Small i
j &#106; Small j
k &#107; Small k
l &#108; Small l
m &#109; Small m
n &#110; Small n
o &#111; Small o
p &#112; Small p
q &#113; Small q
r &#114; Small r
s &#115; Small s
t &#116; Small t
u &#117; Small u
v &#118; Small v
w &#119; Small w
x &#120; Small x
y &#121; Small y
z &#122; Small z
{ &#123; Left curly brace
| &#124; Vertical bar
} &#125; Right curly brace
~ &#126; Tilde
 &#127; Unused
&#128; Unused
  &#160; &nbsp; Non-breaking Space
¡ &#161; &iexcl; Inverted exclamation
¢ &#162; &cent; Cent sign
£ &#163; &pound; Pound sterling
¤ &#164; &curren; General currency sign
¥ &#165; &yen; Yen sign
¦ &#166; &brvbar; or &brkbar; Broken vertical bar
§ &#167; &&sect; Section sign
¨ &#168; &&um; or &&die; Diæresis / Umlaut
© &#169; &&copy; Copyright
ª &#170; &&ordf; Feminine ordinal
« &#171; &&laquo; Left angle quote, guillemot left
¬ &#172; &&not Not sign
­ &#173; &shy; Soft hyphen
® &#174; &reg; Registered trademark
¯ &#175; &macr; or &hibar; Macron accent
° &#176; &deg; Degree sign
± &#177; &plusmn; Plus or minus
² &#178; &sup2; Superscript two
³ &#179; &sup3; Superscript three
´ &#180; &acute; Acute accent
µ &#181; &micro; Micro sign
&#182; &para; Paragraph sign
· &#183; &middot; Middle dot
¸ &#184; &cedil; Cedilla
¹ &#185; &sup1; Superscript one
º &#186; &ordm; Masculine ordinal
» &#187; &raquo; Right angle quote, guillemot right
¼ &#188; &frac14; Fraction one-fourth
½ &#189; &frac12; Fraction one-half
¾ &#190; &frac34; Fraction three-fourths
¿ &#191; &iquest; Inverted question mark
À &#192; &Agrave; Capital A, grave accent
Á &#193; &Aacute; Capital A, acute accent
 &#194; &Acirc; Capital A, circumflex
à &#195; &Atilde; Capital A, tilde
Ä &#196; &Auml; Capital A, diæresis / umlaut
Å &#197; &Aring; Capital A, ring
Æ &#198; &AElig; Capital AE ligature
Ç &#199; &Ccedil; Capital C, cedilla
È &#200; &Egrave; Capital E, grave accent
É &#201; &Eacute; Capital E, acute accent
Ê &#202; &Ecirc; Capital E, circumflex
Ë &#203; &Euml; Capital E, diæresis / umlaut
Ì &#204; &Igrave; Capital I, grave accent
Í &#205; &Iacute; Capital I, acute accent
Î &#206; &Icirc; Capital I, circumflex
Ï &#207; &Iuml; Capital I, diæresis / umlaut
Ð &#208; &ETH; Capital Eth, Icelandic
Ñ &#209; &Ntilde; Capital N, tilde
Ò &#210; &Ograve; Capital O, grave accent
Ó &#211; &Oacute; Capital O, acute accent
Ô &#212; &Ocirc; Capital O, circumflex
Õ &#213; &Otilde; Capital O, tilde
Ö &#214; &Ouml; Capital O, diæresis / umlaut
× &#215; &times; Multiply sign
Ø &#216; &Oslash; Capital O, slash
Ù &#217; &Ugrave; Capital U, grave accent
Ú &#218; &Uacute; Capital U, acute accent
Û &#219; &Ucirc; Capital U, circumflex
Ü &#220; &Uuml; Capital U, diæresis / umlaut
Ý &#221; &Yacute; Capital Y, acute accent
Þ &#222; &THORN; Capital Thorn, Icelandic
ß &#223; &szlig; Small sharp s, German sz
à &#224; &agrave; Small a, grave accent
á &#225; &aacute; Small a, acute accent
â &#226; &acirc; Small a, circumflex
ã &#227; &atilde; Small a, tilde
ä &#228; &auml; Small a, diæresis / umlaut
å &#229; &aring; Small a, ring
æ &#230; &aelig; Small ae ligature
ç &#231; &ccedil; Small c, cedilla
è &#232; &egrave; Small e, grave accent
é &#233; &eacute; Small e, acute accent
ê &#234; &ecirc; Small e, circumflex
ë &#235; &euml; Small e, diæresis / umlaut
ì &#236; &igrave; Small i, grave accent
í &#237; &iacute; Small i, acute accent
î &#238; &icirc; Small i, circumflex
ï &#239; &iuml; Small i, diæresis / umlaut
ð &#240; &eth; Small eth, Icelandic
ñ &#241; &ntilde; Small n, tilde
ò &#242; &ograve; Small o, grave accent
ó &#243; &oacute; Small o, acute accent
ô &#244; &ocirc; Small o, circumflex
õ &#245; &otilde; Small o, tilde
ö &#246; &ouml; Small o, diæresis / umlaut
÷ &#247; &divide; Division sign
ø &#248; &oslash; Small o, slash
ù &#249; &ugrave; Small u, grave accent
ú &#250; &uacute; Small u, acute accent
û &#251; &ucirc; Small u, circumflex
ü &#252; &uuml; Small u, diæresis / umlaut
ý &#253; &yacute; Small y, acute accent
þ &#254; &thorn; Small thorn, Icelandic
ÿ &#255; &yuml; Small y, diæresis / umlaut
Previous Up Index Next

© 1996 Microsoft Corporation