home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The World of Computer Software
/
World_Of_Computer_Software-02-385-Vol-1of3.iso
/
j
/
jdic15.zip
/
JDIC15.DOC
< prev
next >
Wrap
Text File
|
1991-12-11
|
17KB
|
382 lines
J D I C
English Japanese Dictionary Display
===================================
Version 1.5 (November 1991)
INTRODUCTION
------------
This program provides a English/Japanese (kana & kanji) display of selected
entries of a dictionary file. While it will work (more or less) with any text
file containing a mix of Japanese and English words, it has been designed
specifically to operate on a dictionary in the "EDICT" format used by the MOKE
(Mark's Own Kanji Editor) Japanese text editor. Its operating environment has
been designed to be similar to MOKE's, and it uses the same environment
variables and control file as MOKE.
The executable code and documentation of JDIC is hereby released to the "public
domain". All usage of this program is at the user's risk, and there is no
warranty on its performance.
All the Japanese displayed is in kana and kanji, so if you cannot read at least
hiragana and katakana, this is not the program for you.
INSTALLATION
------------
This program is distributed as a "zoo" archive (jdic15.zoo) containing the
following files:
jdic.exe (the executable)
jdic15.doc (this documentation file)
jdxgen.exe (program to generate the dictionary index file)
jreader.exe (text reader with dictionary lookup)
jreader.doc (documentation for jreader)
edictj (Japanese/English dictionary file) (*)
edictj.doc (information about the dictionary) (*)
*.bgi (Borland Graphics drivers for various cards)
kiascii.fnt (alternative font file for ascii characters)
jdicutil.exe (utility program for printing, sorting, etc. the dictionary)
jdicutil.doc (documentation for jdicutil)
k16jis1.fnt (16x16 bitmap of the 3490 "JIS1" characters) (*)
k16jis2.fnt (16x16 bitmap of the 3388 "JIS2" characters) (*)
[ * The edict and k16jis files may not be included in this archive as they
are available elsewhere. They will be included in the version distributed via
the comp.binaries.ibm.pc newsgroup.]
The files will need to be unpacked and copied into a directory on your hard
disk. Rename "edictj" to "edict"; it has been called "edictj" in case you are
storing the files in the same directory as your MOKE files (e.g. \kanji), in
which case you need to be careful not to overwrite MOKE's "edict" file. In
addition, the 16-bit JIS font files "k16jis1.fnt" and "k16jis2.fnt" must be in
this directory. These latter files may not be included in this distribution
(please check). If you use MOKE, you have them already. You may need to track
them down at one of the FTP sites. In the MOKE1.1 distribution they are in the
MK11K16.ZIP archive file. Another source is kd100.arc, the KD distribution
file.
The executable (jdic.exe) will need to be stored in a directory on your path if
you wish to invoke JDIC from any directory. The simplest approach is to add
\kanji to your path.
The following environment variables may be set (note that they are the same
environment variables used by MOKE.)
bgi=directory (the directory containing the bgi files. E.g. c:\tc or
c:\kanji. If this is not present, the bgi files must be in the directory
in which JDIC is invoked. Alternatively you can specify a bgi directory in
the "moke.rc" file (see below).)
mokerc=directory (the directory containing the moke.rc file. E.g.
c:\kanji. If this is not present, the current directory alone will be
searched for a file called moke.rc, and the details extracted.)
jgraphic=ATT (set this to ATT400 if you have an AT&T high-resolution card.
Otherwise it will default to CGA. NB: MOKE does not use this variable.
Again, you can also specify ATT in the "moke.rc" file.)
MOKE.RC
-------
The "moke.rc" file contains control information relating to the operation of
MOKE. JDIC examines this file, if it is present, and adjusts its own
environment (colours, directories, etc.) accordingly. (If you use MOKE, you
will have a moke.rc file already.) A "moke.rc" file is not necessary for
successful operation of JDIC, however it is the only way for the user to
specify screen colours.
The environment variable "mokerc" may be used to specify the directory
containing "moke.rc". Otherwise the file must be in the directory from which
JDIC is invoked.
The file contains one or more lines of text, each comprising a key-word and a
value. The key-words recognized by JDIC are:
kanjipath directory-path (e.g. C:\KANJI)
If you wish to operate JDIC from other directories, you must have this line to
tell the program the location of the control and font files.
graphicstype ATT
Use this if you have an ATT400 graphics adaptor, otherwise the auto-detect will
set the operation to CGA.
bgipath directory-path
This is an alternative to using the "BGI" environment variable.
asciicolor <colour>
backcolor <colour>
kanjicolor <colour>
progcolor <colour>
Use these to set the colours to be used for English text, Japanese text, the
screen background and the control information. The available colours are:
black, blue, green, cyan, red, magenta, brown, lightgray, darkgray, lightblue,
lightgreen, lightcyan, lightred, lightmagenta, yellow and white.
OPERATION
---------
JDIC must operate on a PC or AT with a graphics card. It has been written
using Turbo C 2.0, and has been tested on VGA, CGA, ATT and HERC cards.
Auto-detection is used to determine the type of graphics card, except in the
case of an ATT board (see above).
The invocation of JDIC is:
jdic <options> <dictionary-file>
If no dictionary is specified, it is assumed to be "edict" in the local
directory, or in the directory specified by the "kanjipath" in moke.rc.
JDIC also needs an index file "<dicname>.jdx". If is not present it will need
to be created by running the utility program JDXGEN (see below). JDXGEN saves
the length of the dictionary file and the JDIC version in the .jdx file, and if
JDIC detects a mismatch, it will refuse to operate.
The commandline options are:
-bnn
this restricts the number of 1K buffers used to hold the dictionary to nn.
The default is for JDIC to get as many buffers as it can, up to the size of
the dictionary.
-l
this inhibits JDIC from loading the dictionary initially. The program will
start more quickly, but will operate more slowly (at first) as the pages of
the dictionary are loaded on demand.
-f
forces the use of the 8x8 bgi fonts for English text. The default is to use
the bit-mapped fonts from the kiascii.fnt file, however the display of
these can be slow on an 8088 PC.
Operation of JDIC is very simple. After loading the dictionary, index and font
files, the full-screen working window is displayed with the "Search Key:"
prompt. Type a few letters from the *start* of the word you are seeking. JDIC
does not match on keys in the middle of words. The scan is case-insensitive.
A multi-line display is produced of all the dictionary entries which contain
matches with the search key. The display format is:
matched_word [kanji] (yomikata) english_1, english_2, etc
where "matched_word" is either ascii or kana, depending on the search key. If
the search key was kana (i.e. the match was on the yomikata), the separate
yomikata display is omitted.
A line is only displayed once per search, regardless of the number of matches.
If the search resulted in more entries than will fit on a screen, a further
prompt occurs at the bottom of the screen giving you the option of quitting
(Q), requesting another search (A) or requesting the next screen-full (M).
After the last screen of a search, the "Search Key:" prompt returns.
Exit from JDIC may be requested at any time by using the "Escape" key.
KANA SEARCHING
--------------
You will notice an "(A)" in the bottom lefthand corner of the screen. This is
to indicate you are entering search keys in ascii code (i.e. in English). If
you press F3 before entering a key, you toggle between (A)scii, (H)iragana and
(K)atakana. (Why F3?, well that is the key that MOKE uses for this function.)
To enter a search key in kana, type it in romaji and it will be converted to
kana as you type. The romaji->kana translation is almost identical to that
used in MOKE, i.e. for a small "tsu" you can type either a double consonant,
e.g. "shippai", or "t-", e.g. shit-pai, and for "n" you can type "n'" if
necessary (e.g. as in "hon'ya"). Most of the time just typing ordinary Hepburn
or kunrei romaji works. Note that the romaji must follow the kana style for
long vowels. Tokyo must be toukyou, NOT tookyoo.
The matching of kana keys is insensitive to whether they are katakana or
hiragana.
The display is in "dictionary" order for the words matched, i.e. alphabetical
for the ascii search, and EUC order for the kana search. EUC order is very
close to the "gojuun" kana order in Japanese dictionaries except that it
separates the syllables with nigori and maru.
[JDIC also supports MOKE's "SKK" mode for entering search keys. Ctrl-J will
cause a switch from ascii mode to hiragana, "l" will switch from kana to ascii
and "q" will toggle between hiragana and katakana.]
DISPLAY MODES
-------------
An "Unlimited Display Mode" is invoked by pressing F1 before or during the
entering of the search key. In this mode you will just keep scrolling through
the dictionary instead for stopping when you run out of matching words. Also
in this mode entries are displayed every time there is a match in the index
table (normally an entry is displayed once only.) This mode is useful for doing
maintenance on the dictionary, and for just browsing. If you use this mode you
may get some strange displays for entries which begin with kana and continue
with kanji, e.g. ocha.
A "Maintenance Mode" can be toggled on and off using F2 before or during the
entering of the search key. When in this mode the byte offset within the
dictionary of the matched word is displayed at the end of each line. Also the
number of page faults (if any) when reading the dictionary is displayed at the
top left-hand corner of the screen. The word "MAINT" is displayed at the top
of the screen when this mode is on, along with the number of dictionary page
buffers. This mode is useful for maintenance of edict, which is now too large
to edit in a single file, and for monitoring the dictionary paging mechanism.
DICTIONARY
----------
Clearly to be of any use, JDIC must have a reasonably good dictionary.
Optionally included with this distribution is the EDICTJ dictionary, which is
the author's extension of MOKE's EDICT. MOKE's EDICT was about 1900 entries,
and was compiled by Mark Edwards with help from Spencer Green. EDICTJ is now
nearly 6000 entries. (You can rename it `EDICT' and use it with MOKE; it has
been called EDICTJ to reduce the chance of accidental clobbering.) EDICTJ and
edictj.doc are available separately from several archives.
The dictionary file must use the "EUC" coding for Japanese characters. Files
using JIS codings can be converted to EUC using MOKE itself, or Ken Lunde's
"JIS.C" program.
The format each entry of EDICT is:
Kanji [kana] /english_1/english_2/..../
or
kana /english_1/english_2/..../
JDXGEN
------
JDXGEN is a utility program that parses the dictionary file and produces a file
<dictionary-name>.jdx containing the index table (see below). Its operation is
similar to JDIC in that it uses moke.rc to establish the path to the dictionary
and will operate either on a dictionary called "edict" (the default) or a
user-specified name. There are no other options.
JDXGEN does an in-RAM sort of its index table. As this involves a large buffer
and far pointers it is quite slow. The 6000 entry dictionary results in about
23000 entries in the index, which takes about 8 minutes to sort on an XT.
TECHNICAL
---------
JDIC holds as much as it can of the complete dictionary in RAM, along with the
first 3490 bitmaps of the JIS character set and the index table. The index
table contains an entry for each word in the dictionary, sorted in alpha/kana
order. This enables a fast search to be done, and for the display to be in
alphabetical order by keyword. Common words like: "of", "to", "the", etc. and
grammatical terms like: "adj", "vi", "vt", etc. are not indexed.
If the complete dictionary will not fit in RAM, the missing portions will be
paged in as they are required, using the good old LRU replacement method. The
index table is also paged in as required.
If a kanji is required that is not in the ~3000 most common ones, it is read
from disk into a cache buffer. This happens rarely.
JDIC/JDXGEN can cope with dictionaries up to about 350 kbytes. JDIC itself
could handle a larger dictionary; the limitation is JDXGEN's in-RAM sort. If a
larger dictionary ever comes available, another version of JDXGEN could operate
using a disk sort.
JDIC VERSIONS
-------------
Changes in Version 1.1
o ATT graphics card handling via an environment variable.
o fixes to the parsing of kanji/kana keys. The result is that the .jdx file
is about 20% larger than in V1.0.
Changes in Version 1.2
o fixes to the kana->romaji code to handle "nyu" properly.
o facility to use dictionaries other than "edict".
o Unlimited Display Mode.
Changes in Version 1.3
o immediate romaji->kana conversion
o examination of the "bgi" and "mokerc" environment variables, and the
"moke.rc" control file.
Changes in Version 1.4
o reformatting the output to start with the `hit' word, to put parentheses
around the kanji and kana, and to position the ascii text better with respect
to the kana/kanji.
o handling of an index table greater than 64k bytes.
Changes in Version 1.5
o inclusion of Maintenance Mode
o fixing of the backspace, so that it works in kana entry mode.
o demand paging of the dictionary file and index table
o removal of the .jdx generation into a separate program
o command-line options
o emulation of MOKE's SKK option
o use of colours as specified in moke.rc (*)
o extraction of ATT graphics request and bgi directory from moke.rc (*)
o use of the alternative ascii font (*)
o inclusion of correct romaji conversion for mya, myu and myo (*)
o modified handling of "-" in hiragana.(*)
(*) these items track changes to be made in MOKE V2.1
IT DOESN'T WORK!
----------------
Oh dear. If you do not get the introductory message, you probably have a
corrupted .exe. Try and get a clean copy. Also your environment might have
trouble with the output of a Turbo C 2.0 compilation/link.
If you actually get started, but cannot find any thing, even when you put "a"
as a search key, delete your .jdx file and start again. If it still doesn't
work, mail the author a sample of your dictionary and copies of you
autoexec.bat and config.sys files.
ACKNOWLEDGEMENTS
----------------
A message from the author:
I wrote this program to gain experience in handling and displaying the Japanese
character set, and to exploit the dictionary that came with my copy of MOKE. I
also wanted to brush up my C skills. I make no great claims for it, but I am
pleased how it turned out. Suggestions, comments and constructive criticism are
most welcome.
While I wrote most of this program, lumps of it were lifted with minor
modifications from "KD" (Kanji Driver), which was written by Izumi Ohzawa at
Berkeley, in particular the JIS handling module (kjis.c) which was a port of
"jis.pas" by Seiichi Nomura and Seke Wei.
Ken Lunde's "japan.inf" and his elegant "jis.c" explained the workings of EUC
and old/new JIS codes.
Mark Edwards' MOKE remains the tour de force in this field, and an inspiration
for us all. I regard JDIC as a humble and minor accessory to MOKE. (I use
tables lifted from two of the ".hlp" files in MOKE to drive the romaji->kana
code.)
My thanks to Mark Edwards, David Cowhig, Theresa Martin and Stephun Chung for
their helpful comments and suggestions.
Jim Breen
Department of Robotics & Digital Technology
Monash University
Melbourne, Australia
(jwb@monu6.cc.monash.edu.au)
May-December 1991