Use Unicode

The editor is using internally characters on 16 bits, covering the Unicode character set. When loading a file the editor tries to determine what encoding is used. If it fails to determine it, a list with all known encodings is displayed. In fact this list contains all the encodings accepted by the Java platform:


1. Big5
2. CP037
3. CP278
4. CP280
5. CP284
6. CP285
7. CP297
8. CP420
9. CP424
10.CP500
11.CP870
12.CP871
13.CP918
14.EUCJIS
15.GB2312
16.ISO2022CN
17.ISO2022KR
18.ISO8859_1
19.ISO8859_2

20.ISO8859_3
21.ISO8859_4
22.ISO8859_5
23.ISO8859_6
24.ISO8859_7
25.ISO8859_8
26.ISO8859_9
27.JIS
28.JIS0201
29.JIS0208
30.JIS0212
31.KOI8_R
32.KSC5601
33.MS932
34.SJIS
35.UTF8
36.Unicode
37.UnicodeBig
38.UnicodeLittle

Note that is a small difference in naming the same character encoding. While in XML you will use encoding="UTF-8", in Java the same encoding has the name "UTF8".

If you have loaded a document and you need it in a different encoding, you can indicate in the XML prolog a different encoding name. Ex: Automatically the editor saves the file using the Western European encoding.

The following encodings can be specified in the XML files:
Common Name Use this name in XML files Name Type Java Encoder Name
The editor will ask you for one of these if it cannot detect the encoding of the file
8 bit Unicode UTF-8 IANA UTF8
16 bit Unicode UTF-16 IANA Unicode
16 bit Unicode little endian UTF-16LE IANA UnicodeLittle
16 bit Unicode big endian UTF-16BE IANA UnicodeBig
ISO Latin 1 ISO-8859-1 MIME ISO-8859-1
ISO Latin 2 ISO-8859-2 MIME ISO-8859-2
ISO Latin 3 ISO-8859-3 MIME ISO-8859-3
ISO Latin 4 ISO-8859-4 MIME ISO-8859-4
ISO Latin Cyrillic ISO-8859-5 MIME ISO-8859-5
ISO Latin Arabic ISO-8859-6 MIME ISO-8859-6
ISO Latin Greek ISO-8859-7 MIME ISO-8859-7
ISO Latin Hebrew ISO-8859-8 MIME ISO-8859-8
ISO Latin 5 ISO-8859-9 MIME ISO-8859-9
EBCDIC: US ebcdic-cp-us IANA cp037
EBCDIC: Canada ebcdic-cp-ca IANA cp037
EBCDIC: Netherlands ebcdic-cp-nl IANA cp037
EBCDIC: Denmark ebcdic-cp-dk IANA cp277
EBCDIC: Norway ebcdic-cp-no IANA cp277
EBCDIC: Finland ebcdic-cp-fi IANA cp278
EBCDIC: Sweden ebcdic-cp-se IANA cp278
EBCDIC: Italy ebcdic-cp-it IANA cp280
EBCDIC: Spain, Latin America ebcdic-cp-es IANA cp284
EBCDIC: Great Britain ebcdic-cp-gb IANA cp285
EBCDIC: France ebcdic-cp-fr IANA cp297
EBCDIC: Arabic ebcdic-cp-ar1 IANA cp420
EBCDIC: Hebrew ebcdic-cp-he IANA cp424
EBCDIC: Switzerland ebcdic-cp-ch IANA cp500
EBCDIC: Roece ebcdic-cp-roece IANA cp870
EBCDIC: Yugoslavia ebcdic-cp-yu IANA cp870
EBCDIC: Iceland ebcdic-cp-is IANA cp871
EBCDIC: Urdu ebcdic-cp-ar2 IANA cp918
Chinese for PRC, mixed 1/2 byte gb2312 MIME GB2312
Extended Unix Code, packed for Japanese euc-jp MIME eucjis
Japanese: iso-2022-jp iso-2020-jp MIME JIS
Japanese: Shift JIS Shift_JIS MIME SJIS
Chinese: Big5 Big5 MIME Big5
Extended Unix Code, packed for Korean euc-kr MIME iso2022kr
Cyrillic koi8-r MIME koi8-r

Note that for editing a document written let say in Japanese or Chinese, you will need to change the font to one that supports the specific characters (a Unicode font). For the Windows platform, we recommend "Arial Unicode MS" or "MS Gothic". Do not expect Worpad or Notepad to proper handle encodings. Use Explorer or Word to eventually examine XML documents.