Play and Learn 2

home *** CD-ROM | disk | FTP | other *** search

/ Play and Learn 2 / 19941.ZIP / 19941 / JAPANESE / ZIPFILES / KANJI20.ZIP / JIS-FILE.DOC < prev next >

Wrap

Text File | 1989-07-03 | 5.8 KB | 118 lines

Shin-JIS code tables Michael J. Imber (CIS 71631,563) This arc file contains a set of files which display two charcter Shin-JIS ascii codes along with the corresponding Japanese character when viewed with the display program, Kanjiview 2.0, by Steven Johnston. The files were constructed so that a single file would contain all the possible characters encoded by character com- binations C1-C2, where C1 is a constant character throughout the file, and C2 is a variable character. The files are designed to display as a single screen when viewed with an IBM-compatible PC with VGA graphics. Each file will ap- pear as a five double rows separated by an empty row. Each double row includes 20 code-character combinations, with the two-character ascii code appearing above the corresponding Japanese character. I generated these files using XyWrite III+ as part of a project to develop XPL routines to convert roman text to the appropriate JIS codes. Perhaps the material as organized herein may be of use to others with similar interests. Below are a number of empirical observations and comments regarding the organization of the JIS codes and corresponding Japanese characters. The files and the following observations are based entire- ly on my experimenting with Steve Johnston's KanjiView 2.0 program using different character combinations. Any errors in observation or interpretation are entirely due to my own inadequate efforts. 1. The range of ascii characters employed in these JIS codes in- clude the 94 characters represented by ascii 33-126, inclusive; that is beginning with character ! (#33) and ending with ~ (#126). These are all within the 7-bit, printable ascii range, unlike the Shift-JIS codes which employ ascii characters above 128. 2. Each Japanese character is represented by a two-character com- bination. Therefore 94x94, or 8836 codes are possible. Study of the various ascii combinations, however, reveals that not all such codes are employed for unique characters. 3. The included files display all possible ascii character com- binations "C1-C2" where the first character, C1, is constant, and the second character "C2" varies between ascii 33(!) and 126(~). All files have the filename SHIN2. The extension of each filename reflects the constant first character, C1, used in that file. For example, all files in which C1 is an alphanumeric character, i.e. ascii #48-57 (0123456789), #65-79 (A-P), have the extension corresponding to the alphanumeric character. Therefore the files displaying code combina- tions in which the first character is a numeral are shin2.0, shin2.1, shin2.3, etc. Likewise for roman letters: shin2.A, shin2.B, shin2.C, etc. Those files representing non-alphanumeric first characters have the extension corresponding to the ascii code itself. For example, shin2.33 includes all character combination with ! as the first character. Those files which display a degenerate set of characters are omitted from this collection, with the exception of shin2.P. 4. The range of characters, C1, which yield unique sets of characters are limited to the following: ascii 33-40 !"#$%&'( ascii 48-57 0123456789 ascii 58-64 :;<=>?@ ascii 65-79 ABCDEFGHIJKLMNO Other characters, C1, from ascii ranges 41-47 and 80-126 generate a degenerate set of characters or noise which is similar from set to set. The collection of files includes shin2.P as an example of such a degenerate set. 5. The JIS codes appear to segregate into two groups of two- character combinations. Those combinations where the first character ranges from ascii 33-40 represent non-Kanji characters, whereas com- binations in which the first character ranges from 48-79 represent the Kanji. 6. The non-Kanji code combinations appear to segregate as follows, ac- cording to the first character: ascii 33 ! blank space, punctuation marks, symbols, opera- tors ascii 34 " symbols, operators, musical notations ascii 35 # Arabic numerals, Roman alphabet ascii 36 $ Hiragana ascii 37 % Katakana ascii 38 & Greek alphabet ascii 39 ' Russian alphabet ascii 40 ( line graphic characters Within each file displaying all 94 character combinations, many Kanji and occasionally other characters may be observed occupying degenerate stretches of JIS code. The display of alphanumeric characters is easy to remember since the second character of the two-character combina- tion corresponds to the actual JIS display; i.e #5 displays 5, #D dis- plays D, #r displays r, etc. 7. The Kanji are represented by code combinations in which the first character ranges from 48 (Arabic numeral 0) to 79 (Roman letter O). The final code combination actually corresponding to a unique Kanji character is OS. Code combinations following appear degenerate. The ordering of Kanji characters by JIS code combinations 0! to OS appears to be primarily according to "on" phonetizations, and secondarily to "kun" where no "on" exists. Therefore, random identification of several Kanji from each file will enable construction of a rough a-i- u-e-o glossary of phonetizations, from which other characters and cor- responding JIS codes may be retrieved. I must emphasize that Steve Johnston's KanjiView 2.0 has been a tremendous contribution to all interested in the use of IBM-PC ap- plications and the study of Japanese. I eagarly await the introduction of his editor which will should make an even greater impact. I hope that these files may be of interest to other CIS or FLEFO members interested in the Japanese language and JIS coding. I look forward to any comments or criticisms that others using this material might have. Michael J. Imber Boston, MA CIS 71631, 563