home *** CD-ROM | disk | FTP | other *** search
- Shin-JIS code tables
- Michael J. Imber (CIS 71631,563)
-
- This arc file contains a set of files which display two
- charcter Shin-JIS ascii codes along with the corresponding Japanese
- character when viewed with the display program, Kanjiview 2.0, by
- Steven Johnston. The files were constructed so that a single file
- would contain all the possible characters encoded by character com-
- binations C1-C2, where C1 is a constant character throughout the file,
- and C2 is a variable character.
-
- The files are designed to display as a single screen when
- viewed with an IBM-compatible PC with VGA graphics. Each file will ap-
- pear as a five double rows separated by an empty row. Each double row
- includes 20 code-character combinations, with the two-character ascii
- code appearing above the corresponding Japanese character. I generated
- these files using XyWrite III+ as part of a project to develop XPL
- routines to convert roman text to the appropriate JIS codes. Perhaps
- the material as organized herein may be of use to others with similar
- interests.
-
- Below are a number of empirical observations and comments
- regarding the organization of the JIS codes and corresponding Japanese
- characters. The files and the following observations are based entire-
- ly on my experimenting with Steve Johnston's KanjiView 2.0 program
- using different character combinations. Any errors in observation or
- interpretation are entirely due to my own inadequate efforts.
-
- 1. The range of ascii characters employed in these JIS codes in-
- clude the 94 characters represented by ascii 33-126, inclusive; that
- is beginning with character ! (#33) and ending with ~ (#126). These are
- all within the 7-bit, printable ascii range, unlike the Shift-JIS
- codes which employ ascii characters above 128.
-
- 2. Each Japanese character is represented by a two-character com-
- bination. Therefore 94x94, or 8836 codes are possible. Study of the
- various ascii combinations, however, reveals that not all such codes
- are employed for unique characters.
-
- 3. The included files display all possible ascii character com-
- binations "C1-C2" where the first character, C1, is constant, and the
- second character "C2" varies between ascii 33(!) and 126(~). All files
- have the filename SHIN2. The extension of each filename reflects the
- constant first character, C1, used in that file. For example, all
- files in which C1 is an alphanumeric character, i.e. ascii #48-57
- (0123456789), #65-79 (A-P), have the extension corresponding to the
- alphanumeric character. Therefore the files displaying code combina-
- tions in which the first character is a numeral are shin2.0, shin2.1,
- shin2.3, etc. Likewise for roman letters: shin2.A, shin2.B, shin2.C,
- etc. Those files representing non-alphanumeric first characters have
- the extension corresponding to the ascii code itself. For example,
- shin2.33 includes all character combination with ! as the first
- character. Those files which display a degenerate set of characters
- are omitted from this collection, with the exception of shin2.P.
-
- 4. The range of characters, C1, which yield unique sets of
- characters are limited to the following:
- ascii 33-40 !"#$%&'(
- ascii 48-57 0123456789
- ascii 58-64 :;<=>?@
- ascii 65-79 ABCDEFGHIJKLMNO
- Other characters, C1, from ascii ranges 41-47 and 80-126 generate a
- degenerate set of characters or noise which is similar from set to
- set. The collection of files includes shin2.P as an example of such a
- degenerate set.
-
- 5. The JIS codes appear to segregate into two groups of two-
- character combinations. Those combinations where the first character
- ranges from ascii 33-40 represent non-Kanji characters, whereas com-
- binations in which the first character ranges from 48-79 represent the
- Kanji.
-
- 6. The non-Kanji code combinations appear to segregate as follows, ac-
- cording to the first character:
- ascii 33 ! blank space, punctuation marks, symbols, opera-
- tors
- ascii 34 " symbols, operators, musical notations
- ascii 35 # Arabic numerals, Roman alphabet
- ascii 36 $ Hiragana
- ascii 37 % Katakana
- ascii 38 & Greek alphabet
- ascii 39 ' Russian alphabet
- ascii 40 ( line graphic characters
- Within each file displaying all 94 character combinations, many Kanji
- and occasionally other characters may be observed occupying degenerate
- stretches of JIS code. The display of alphanumeric characters is easy
- to remember since the second character of the two-character combina-
- tion corresponds to the actual JIS display; i.e #5 displays 5, #D dis-
- plays D, #r displays r, etc.
-
- 7. The Kanji are represented by code combinations in which the first
- character ranges from 48 (Arabic numeral 0) to 79 (Roman letter O).
- The final code combination actually corresponding to a unique Kanji
- character is OS. Code combinations following appear degenerate. The
- ordering of Kanji characters by JIS code combinations 0! to OS appears
- to be primarily according to "on" phonetizations, and secondarily to
- "kun" where no "on" exists. Therefore, random identification of
- several Kanji from each file will enable construction of a rough a-i-
- u-e-o glossary of phonetizations, from which other characters and cor-
- responding JIS codes may be retrieved.
-
-
- I must emphasize that Steve Johnston's KanjiView 2.0 has been
- a tremendous contribution to all interested in the use of IBM-PC ap-
- plications and the study of Japanese. I eagarly await the introduction
- of his editor which will should make an even greater impact.
-
- I hope that these files may be of interest to other CIS or
- FLEFO members interested in the Japanese language and JIS coding. I
- look forward to any comments or criticisms that others using this
- material might have.
-
-
- Michael J. Imber
- Boston, MA
- CIS 71631, 563
-