home *** CD-ROM | disk | FTP | other *** search
-
-
-
- SPELL V2.0 DOCUMENTATION
- Michael C. Adler
- December 22, 1982
-
- (C) 1982 Michael C. Adler
- This program has been released into the public domain by
- the author. It may neither be sold for profit nor included
- in a sold software package without permission of the
- author.
-
- The first SPELL using this dictionary was probably written
- by Ralph Gorin at Stanford. It was transported to MIT by
- Wayne Mattson. Both the program at MIT and the dictionary
- were most recently revised by William Ackerman at MIT.
- Section 5 of this document was copied from portions of Mr.
- Ackerman's documentation.
-
- Thanks to all for the effort spent designing the
- dictionary!
-
- Spell is a program, written for Z80 processors running CP/M,
- designed to detect misspellings in a document.
-
- 1. USING SPELL
-
- The minimum configuration of SPELL requires the files
- SPELL.COM and DICT.DIC (the main dictionary). At the time of
- execution, DICT.DIC must be on either the default drive or drive
- A:.
-
- The name of the file to be corrected must be included on the
- command line that is used to invoke spell. If a drive name is
- specified as a second file name, output is directed to the speci-
- fied drive. Thus,
-
- SPELL useless.doc
-
- will check the file "useless.doc" and direct output to the
- default drive and
-
- SPELL b:useless.doc c:
-
- will check the file "b:useless.doc" and direct output to disk c.
-
- Spell will check the input file for errors by comparing each
- word in the file to the dictionary. If a word is not found, a
- null (ascii 0) is placed before the word. To change this marking
- character, see section 4, PATCHING SPELL. If a backup version
- (.BAK file type) of the input file exists, it will be deleted.
- The input file will be renamed to a backup file and the checked
- file will replace the input file.
-
- 2. USER DICTIONARIES
-
- A user dictionary is a list of correct words that can be
-
-
- 1
-
-
-
- loaded by SPELL to augment the main dictionary. Words such as
- proper nouns can be placed in user dictionaries to inhibit error
- marking. User dictionary files may be formatted in any way that
- the user desires, as long as words are delimited by non-alphabe-
- tic characters.
-
- SPELL will automatically search for the user dictionary
- SPELL.DIC on the default drive and on drive A: if it is not on
- the default one. It's contents are then loaded and temporarily
- added to the dictionary. It must be loaded again to be included
- in subsequent executions of SPELL.
-
- SPELL will also automatically search for d:file.UDC, where
- file is the name of the file being corrected and d: is the drive
- on which file is found. If found, it is also loaded and tempo-
- rarily augments the dictionary. Thus, users may create separate
- dictionaries for each text file being corrected. After locating
- d:file.UDC, SPELL will search file d:file.ADD. This file is
- created by WordStar's ^QL command (see section 3) and is not an
- ASCII file. d:file.ADD contains commands generated by WordStar
- to include specific words in the user dictionary associated with
- d:file. SPELL will temporarily place all of the words in it in
- the dictionary and will also save the words by copying them into
- d:file.UDC.
-
- It is possible to load additional user dictionaries by
- specifying them on the SPELL command line. A list of user dic-
- tionaries must be preceded by a dollar sign. A dictionary is
- specified by a file name and an optional drive name. If no drive
- is specified, the default drive is searched and then drive A: is
- checked. Extensions are ignored and default to .DIC. Hence, the
- the command line:
-
- SPELL useless.doc b: $dict1 c:dict2 dict3.fun
-
- would correct useless.doc and direct output to drive B:. User
- dictionary DICT1.DIC would be loaded from the default drive or
- drive A:, dictionary DICT2.DIC would be loaded from drive C:,
- and DICT3.DIC would be loaded from the default drive or drive A:.
- Notice that the extension .fun was ignored.
-
- 3. WordStar's ^QL COMMAND
-
- Files checked by SPELL can be corrected using WordStar. In
- response to ^QL, the user is asked which portions of the file
- should be searched. WordStar will then position the cursor on
- the first marked word and print a menu offering F (Fix word), B
- (Bypass word), I (Ignore word), D (Add to dictionary), and S (Add
- to supplemental dictionary). The F option deletes the error
- marker and returns to the WordStar main menu, allowing the user
- to correct the word. B will leave the word marker and will
- search for the next misspelled word. In this implementation of
- SPELL, the I, D and S options all perform the same function
- (although I is easier to use because no question is asked by
- WordStar). If either of these options (I, D, S) are chosen, the
-
-
- 2
-
-
-
- mark will be removed and the word will be added to file.ADD.
- Thus, choosing these options informs SPELL that the word is cor-
- rect and should not be marked again. The D and S options do not
- add the word to SPELL's main dictionary because the compression
- method used to store the dictionary is too complicated to allow
- such modification efficiently. After choosing all of the options
- except F, WordStar will automatically search for the next marked
- word.
-
- 4. PATCHING SPELL
-
- It is not necessary to recompile SPELL to change the charac-
- ter that marks misspelled words. The byte at 0800H contains the
- marking character. In the distribution version of SPELL, it is
- null, or 0. DDT or another debugger can be used to change 0800H
- to the ASCII value of the desired marker.
-
- 5. PROGRAM AND DICTIONARY CHARACTERISTICS
-
- 5.1 Word identification algorithm
-
- A word is any uninterrupted sequence of letters and
- apostrophes, which does not begin or end with an apostrophe.
- Any punctuation, digit, or control character separates words.
- Any word consisting of a single letter, or any word more than
- 40 letters long, is considered to be correctly spelled.
-
- 5.2 Dictionary policy
-
- It is the policy of this program to contain only one
- spelling of a word, even if ordinary dictionaries show two
- or more "acceptable" spellings. Hence, the dictionary
- contains LABELED and LABELING, but not LABELLED or LABELLING,
- even though all four are actually acceptable. The intention
- is to enforce uniformity within each document. The author
- apologizes for the restriction on creativity and diversity
- that this necessitates, but believes that it is the best policy
- for this program.
-
- The dictionary contains many technical and computer
- terms such as MICROPROGRAM and DEBUGGER, but does not contain
- extreme jargon words such as CONTROLIFY or VALRET. The
- dictionary contains no proper names other than names of countries
- and states of the United States. The reason is that it
- would be virtually impossible to contain all of the proper names
- that commonly arise in normal use. Users should keep proper
- names (and other correctly spelled words) that arise in
- their own work in private dictionaries to avoid having to repeat-
- edly tell SPELL to accept them.
-
- The dictionary is significantly smaller than that found
- in other spelling checkers, such as the DEC TOPS-20 program.
- The author believes that the larger dictionary would not reduce
- the number of false misspelling indications by very much.
-
-
-
- 3
-
-
-
- [Note: I believe that this dictionary is actually MUCH larger
- than any dictionaries currently available for microcomputers.
- -Michael]
-
- 5.3 Dictionary flags
-
- Words in SPELL's main dictionary (but not the other dictio-
- naries) may have flags associated with them to indicate the
- legality of suffixes without the need to keep the full
- suffixed words in the dictionary. The flags have "names" consis-
- ting of single letters. Their meaning is as follows:
-
- Let # and @ be "variables" that can stand for any letter.
- Upper case letters are constants. "..." stands for any
- string of zero or more letters, but note that no word may
- exist in the dictionary which is not at least 2 letters long, so,
- for example, FLY may not be produced by placing the "Y" flag
- on "F". Also, no flag is effective unless the word that it
- creates is at least 4 letters long, so, for example, WED
- may not be produced by placing the "D" flag on "WE".
-
- "V" flag:
- ...E --> ...IVE as in CREATE --> CREATIVE
- if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE
-
- "N" flag:
- ...E --> ...ION as in CREATE --> CREATION
- ...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION
- if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN
-
- "X" flag:
- ...E --> ...IONS as in CREATE --> CREATIONS
- ...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS
- if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS
-
- "H" flag:
- ...Y --> ...IETH as in TWENTY --> TWENTIETH
- if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH
-
- "Y" FLAG:
- ... --> ...LY as in QUICK --> QUICKLY
-
- "G" FLAG:
- ...E --> ...ING as in FILE --> FILING
- if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING
-
- "J" FLAG"
- ...E --> ...INGS as in FILE --> FILINGS
- if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS
-
- "D" FLAG:
- ...E --> ...ED as in CREATE --> CREATED
- if @ .ne. A, E, I, O, or U,
- ...@Y --> ...@IED as in IMPLY --> IMPLIED
- if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
-
-
- 4
-
-
-
- ...@# --> ...@#ED as in CROSS --> CROSSED
- or CONVEY --> CONVEYED
-
- "T" FLAG:
- ...E --> ...EST as in LATE --> LATEST
- if @ .ne. A, E, I, O, or U,
- ...@Y --> ...@IEST as in DIRTY --> DIRTIEST
- if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
- ...@# --> ...@#EST as in SMALL --> SMALLEST
- or GRAY --> GRAYEST
-
- "R" FLAG:
- ...E --> ...ER as in SKATE --> SKATER
- if @ .ne. A, E, I, O, or U,
- ...@Y --> ...@IER as in MULTIPLY --> MULTIPLIER
- if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
- ...@# --> ...@#ER as in BUILD --> BUILDER
- or CONVEY --> CONVEYER
-
- "Z FLAG:
- ...E --> ...ERS as in SKATE --> SKATERS
- if @ .ne. A, E, I, O, or U,
- ...@Y --> ...@IERS as in MULTIPLY --> MULTIPLIERS
- if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
- ...@# --> ...@#ERS as in BUILD --> BUILDERS
- or SLAY --> SLAYERS
-
- "S" FLAG:
- if @ .ne. A, E, I, O, or U,
- ...@Y --> ...@IES as in IMPLY --> IMPLIES
- if # .eq. S, X, Z, or H,
- ...# --> ...#ES as in FIX --> FIXES
- if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
- ...# --> ...#S as in BAT --> BATS
- or CONVEY --> CONVEYS
-
- "P" FLAG:
- if @ .ne. A, E, I, O, or U,
- ...@Y --> ...@INESS as in CLOUDY --> CLOUDINESS
- if # .ne. Y, or @ = A, E, I, O, or U,
- ...@# --> ...@#NESS as in LATE --> LATENESS
- or GRAY --> GRAYNESS
-
- "M" FLAG:
- ... --> ...'S as in DOG --> DOG'S
-
- Note: The existence of a flag on a root word in the directory
- is not by itself sufficient to cause SPELL to recognize the
- indicated word ending. If there is more than one root for
- which a flag will indicate a given word, only one of the roots
- is the correct one for which the flag is effective; generally it
- is the longest root. For example, the "D" rule implies that
- either PASS or PASSE, with a "D" flag, will yield PASSED. The
- flag must be on PASSE; it will be ineffective on PASS. This
- is because, when SPELL encounters the word PASSED and fails to
-
-
- 5
-
-
-
- find it in its dictionary, it strips off the "D" and looks
- up PASSE. Upon finding PASSE, it then accepts PASSED if and
- only if PASSE has the "D" flag. Only if the word PASSE is not in
- the main dictionary at all does the program strip off the "E"
- and search for PASS. Furthermore, some combinations of flags
- are forbidden to allow for dense flag encoding to save space.
- For example, only one of the "P", "J", or "V" flags may be on in
- any one word.
-
- 6. SPELL INTERNALS
-
- SPELL uses a number of temporary files during execution.
- The file file.D$$ is the union of file.UDC and file.ADD. At the
- end of execution, file.UDC and file.ADD are deleted and file.D$$
- is renamed to file.UDC. The file file.$$$ is the output file.
- At the end of execution, file.BAK is deleted, the input file is
- renamed to file.BAK, and file.$$$ is renamed to the input file
- name. Warning: if you do not have room on your disk for
- file.BAK, file.DOC and file.$$$ at the same time, either use two
- drives or delete file.BAK before you start.
-
- SPELL corrects files with two passes of the input file. On
- the first pass, the words in the file are sorted alphabetically
- and duplicate words are eliminated. An attempt is then made to
- search for the words in the dictionary. Words that are found are
- marked. On the second pass of the input file, SPELL determines
- whether each word was found by locating them in memory. This
- method makes the operation of SPELL more efficient because common
- words must be looked up only once and because the dictionary can
- be searched sequentially, minimizing disk head travel. If all of
- the file does not fit in memory on the first pass, the input file
- is partitioned into sections small enough to fit into memory and
- is then corrected in a series of two pass operations until the
- entire file has been checked. It is unlikely that memory will be
- filled in large systems by even large text files as 3000 individ-
- ual words should fit easily.
-
- 7. DICTIONARY INTERNALS
-
- The dictionary has been compressed, significantly, in order
- to save space. Dictionary records are all 256 bytes long and
- each record contains as many words as will fit. Individual words
- are stored in the following code:
-
- 4 bits -- Number of characters to copy from the previous
- word. Because the dictionary is stored in
- alphabetical order, this saves a large number of
- characters. This field is 0 at the beginning of
- each record.
-
- x * 5 bits -- Characters are stored in 5 bit code. There may be
- any number of 5 bit characters. A character
- string is terminated by the following field.
-
- 3 bits -- Set to 111 binary to indicate the end of the word.
-
-
- 6
-
-
-
- Since 11100 binary is greater than 26, all
- alphabetic characters can be stored without using
- this combination.
-
- 4 bits -- Number of bits of flag data following the word.
- The bit position of the flags has been ordered so
- that the flags most frequently used are earliest.
- Flags not stored are assumed to be off.
-
- x bits -- Flag data. x is determined by the previous field.
- Each bit represents one of the 14 suffix flags.
-
- 8. MODIFYING THE MAIN DICTIONARY
-
- The source for the main dictionary can currently be found in
- the file "[MIT-XX]SRC:<WBA>SPELL.DCT". In order to make it com-
- patible with SPELL, all of the "/" characters that delimit flags
- must be converted to "%" characters so that flags will be consid-
- ered earlier in the alphabet than hyphens (DOG%S should be before
- DOG'S). The file must then be sorted alphabetically. No utili-
- ties are provided with SPELL to accomplish either of these tasks.
- Without high capacity disk drives, you may find it necessary to
- perform the above steps on a larger computer.
-
- Once a copy of the main dictionary has been placed on the
- microcomputer, use the program DICCRE to create a dictionary.
- Include the name of the source file on the DICCRE command line.
- DICCRE will create the files DICT.DIC (compressed dictionary) and
- SPELL0.MAC (pointer file to dictionary) ON THE DEFAULT DISK
- DRIVE. When it has finished converting the input file to the
- dictionary file, it will execute a warm boot if the output file
- is on the same drive as the input file. However, if the output
- file is not on the same disk, it will ask whether another input
- file exists. This feature allows the user to put the source file
- on two disks in case it does not fit on one. DICCRE will combine
- them into one dictionary file. If no more files exist, answer N
- to the question. If another file does exist, put the disk with
- the new file in the input drive and type Y.
-
- After the dictionary file has been created, it is necessary
- to recompile SPELL with the new pointer file, SPELL0.MAC. If
- your assembler does not support the INCLUDE statement, you will
- have to replace the line INCLUDE SPELL0.MAC in the file SPELL.MAC
- with the contents of SPELL0.MAC. After SPELL is recompiled, be
- sure to use the correct copy of DICT.DIC with it or you will
- obtain unpredictable results.
-
- For more information about dictionaries, see the file:
- [MIT-XX]SS:<WBA>DICT.LETTER
-
- Good luck and happy hacking!
-
- Michael Adler (MADLER@MIT-ML)
- 3 Sunny Knoll Terrace
- Lexington, MA 02173
-
-
- 7