home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Crawly Crypt Collection 1
/
crawlyvol1.bin
/
apps
/
word
/
makedict
/
makedict.asc
< prev
next >
Wrap
Text File
|
1991-04-04
|
31KB
|
827 lines
Spell/ST User Guide v3.2
Murray Langton and David Tilley
April, 1991
A versatile English spelling checker and dictionary maintainer
Text enclosed in square brackets and the Appendices
may be skipped at a first reading
0 INTRODUCTION 2 USING MAKEDICT.PRG
0.0 Distribution 2.0 Introduction
0.1 System Requirements 2.1 The 'Desk' Menu
0.2 Files You Need 2.2 The 'Dictionaries' Menu
0.3 Overview
0.4 Limitations
0.5 Your Comments APPENDICES
1 USING SPELL.PRG A What is a Word?
1.0 Introduction B Dictionary Format
1.1 The 'Desk' Menu C Fatal Error Messages
1.2 The 'File' Menu D Background Information
1.3 The 'Options' Menu E Known Bugs
1.4 The 'Dictionaries' Menu F Improvements
0 INTRODUCTION
0.0 Distribution
Spell/ST and MakeDict are Copyright 1991 Murray Langton and David
Tilley.
Spell/ST and MakeDict are in the public domain. Commercial use of
all or any part of this software or its dictionaries is for-
bidden.
Spell/ST's dictionaries were accumulated from many sources over
the years, too many to acknowledge individually. However, the
following deserve special mention:
Users and staff of the University of London Computer Centre
Jeff Horne's 'Codebreaker' disc of UK telephone exchange codes
Mistakes in Spell/ST's dictionaries remain our fault, although we
accept no liability for them. We also accept no liability for use
of the Spell/ST and MakeDict programs.
All manufacturers' trademarks are acknowledged.
0.1 System Requirements
You need an Atari ST with at least 400 kilobytes' free memory to
run Spell/ST. On a 520 ST or a 1040 ST, you may have to remove
one or more of your desktop accessories before Spell/ST will
work. MakeDict requires at least 360 kilobytes' free memory.
Since neither program has been tested on an STE or on versions of
TOS later than 1.0 (1985), we would appreciate reports on how
they behave with such systems.
Both Spell/ST and MakeDict work with Atari's high- or medium-
resolution monitor, but not in low resolution. They have not been
tested with large screens.
Spell/ST and MakeDict are designed to be run from hard disc or
ram disc; they are very slow when used from diskette.
Because MakeDict needs access to the '.WRD' files and enough disc
space to create the MASTER.DIC and MASTER.IND files, it cannot
run satisfactorily on systems having only one single-sided
diskette drive.
0.2 Files You Need
Before using Spell/ST, you should ensure that you have the foll-
owing six files:
File name Description
MASTER.DIC Spell/ST's master dictionary (binary)
MASTER.IND Index to Spell/ST's master dictionary (binary)
READ.ME Information not in SPELL.ASC
SPELL.ASC This document
SPELL.PRG Executable Spell/ST program
SPELL.RSC Spell/ST's resource file
All should be placed in the same folder as SPELL.PRG or in the
root directory of the drive containing it.
Before using MakeDict, you should ensure that you have at least
the following eight files:
File name Description
COMPUTER.WRD Dictionary of computerese terms (text)
MAIN.WRD Dictionary of English words (text)
MAKEDICT.ASC This document (same as SPELL.ASC)
MAKEDICT.PRG Executable MakeDict program
MAKEDICT.RSC MakeDict's resource file
MAKEDICT.SET Default location of '.WRD', '.DIC' and '.IND'
files (not to be edited)
NAMES.WRD Dictionary of names and places (text)
READ.ME Information not in MAKEDICT.ASC
All should be placed in the same folder as MAKEDICT.PRG or in the
root directory of the drive containing it.
You do not need MakeDict to run Spell/ST. They could be supplied
separately by your BBS or archive system.
0.3 Overview
Spell/ST reads text from a file, breaks that text up into words,
looks up each word in a dictionary, and displays those words that
it couldn't find. It has some knowledge of plurals, suffixes and
prefixes. Spell/ST ignores some text-formatting commands and
works on 1st. Word Plus '.DOC' files. It also detects consecutive
duplicate words and most common split infinitives.
Various optional facilities are available:
o four formats for the display of unrecognised words
o ignore words containing only upper-case letters, digits, and
special characters
o suggestions for the correct spelling of unrecognised words
o supply up to three personal dictionaries to supplement the
standard dictionaries
o supply a personal 'reject' dictionary containing words which
are to be treated as unrecognised, even if they appear in
another dictionary
o which, if any, of the dictionaries should be used
o create a dictionary
o log unrecognised words
o add Spell/ST's report to a file
There are many words which can be spelt with either 'ise' or
'ize' at the end, recognise or recognize, for example. For such
words, Spell/ST will accept either form (or most variants), but
will report those words which are spelt inconsistently within a
document.
The master dictionary used by Spell/ST is constructed from three
smaller dictionaries for ease of maintenance. The 'main' diction-
ary contains English words (but not words valid only in American
English), the 'names' dictionary names of people and places, and
the 'computer' dictionary computing terms. A few words may appear
in more than one dictionary. The master dictionary may optionally
contain a reject dictionary and three other dictionaries con-
structed by you.
As far as Spell/ST is concerned, all dictionaries are contained
by one file called MASTER.DIC. The MakeDict program is used to
construct a MASTER.DIC file from the various '.WRD' files supp-
lied with Spell/ST or constructed by you. MakeDict is described
in chapter two.
0.4 Limitations
Please note that Spell/ST merely reports words from a document
which were not found in a dictionary; these represent _potential_
spelling mistakes rather than actual mistakes. In practice, there
will be some spelling mistakes, some technical terms, some abbre-
viations, and some words which are correctly spelt but which are
not (yet) in a dictionary.
Spell/ST cannot locate mistakes which produce some other correct-
ly spelt word, nor does it check context, grammar or punctuation.
Spell/ST has no facilities for interactive spelling correction.
Areas where Spell/ST is known to be weak include the following:
o No distinction is made between upper- and lower-case letters
o No attempt is made to check that 's is used correctly at
the end of a word, though other abbreviations using ' are
checked
o Hyphenated words are not always recognised, especially if
they are split at the end of a line
Please see Appendices D, E and F for further discussion.
0.5 Your Comments
Comments on how Spell/ST or MakeDict may be improved should be
e-mailed to David Tilley at:
DRT10@UK.AC.CAM.PHX on JANET
or DRT10%PHX.CAM.AC.UK@CUNYVM.CUNY.EDU on Internet
or telephoned to +44 (0)81-399 8372; but please see Section 0.4
and Appendices D, E and F before making them. Your suggestions
for additions or corrections to Spell/ST's dictionaries should be
sent by e-mail; other people could benefit from them in a later
release.
1 USING SPELL.PRG
1.0 Introduction
Double-click on the SPELL.PRG icon. You will then see the titles
of five drop-down menus. Their functions are described below.
1.1 The 'Desk' Menu
The 'Desk' menu looks something like the following:
----
|Desk|
|-------------------
| About Spell/ST... |
-------------------
Our copyright notice and Spell/ST's version number are displayed
when you click on the 'About Spell/ST...' item.
1.2 The 'File' Menu
The 'File' menu looks something like the following:
----
|File|
|-------------------
| Scan document... |
| Report to disc |
|-------------------|
| Quit Spell/ST |
-------------------
Scan document:
Use this item to bring up the usual (or your preferred) file sel-
ector. Choose the name of the file containing the text whose
spelling is to be checked. The first time you use this item,
Spell/ST loads its master dictionary; this takes about four sec-
onds from hard disc, and about fifty seconds from diskette. You
can use this item to check in turn as many documents as you wish.
The time taken to produce a complete report varies according to
the size of a document and your equipment. For example, Spell/ST
took about one minute to scan a copy of this guide held on hard
disc and about one-and-a-quarter minutes for a copy on diskette.
Report to disc:
Spell/ST normally sends its reports to a scrolling window only.
By clicking on this item, the report will also be added to a
file, SPELL.REP in the folder from which SPELL.PRG was executed.
The report produced by Spell/ST will contain:
o the location of split infinitives and consecutive duplicate
words
o the number of lines and words in the file being checked
o the time taken to check the file
o how many unrecognised words were found
o a list, in alphabetical order, of unrecognised words
By default, only the 'main' English dictionary is used. You can
stop a report by pressing 'q' or 'Q' at the 'More...' prompt.
You can use the 'Options' and 'Dictionaries' menus, described
below, to alter the format and content of the report and to
select additional dictionaries.
Quit Spell/ST:
When you're finished, click on this item to return to the desk-
top.
1.3 The 'Options' Menu
The 'Options' menu looks something like the following:
-------
|Options|
|----------------------
| Alphabetical order | This item will have a tick mark...
| Order of occurrence |
| Words in context |
|----------------------|
| Duplicates | ...so will this...
| Split infinitives | ...and this...
| Make suggestions |
| Ignore u/c words |
|----------------------|
| Statistics | ...and this
| Frequency counts |
|----------------------|
| Log unknown words |
----------------------
Items marked with ticks are Spell/ST's default options.
Alphabetical order:
This item causes an alphabetical list of all unrecognised words
to be produced. Note than a frequency count of how many times
each word appeared is not included; see the 'Frequency counts'
option below. [Because Spell/ST has to sort all unrecognised
words into alphabetical order, it will take longer to start prod-
ucing a report compared with the 'Order of occurrence' and 'Words
in context' options (see below).]
Order of occurrence:
List all unrecognised words in order of occurrence with each word
preceded by the number of the line in which it occurs. Note that
all occurrences of any unrecognised word will be displayed.
Words in context:
Display all lines containing an unrecognised word, preceded by
their line number, and underline each unrecognised word with
carets (^).
Duplicates:
Report consecutive duplicate words and their line number.
[A common mistake, especially after a document has been edited
several times, is to have two consecutive words the same. Genuine
duplicated words are rare in English, so Spell/ST will report any
duplicated words and the relevant line number. One of the dup-
licates may be at the end of the previous line. Since blank lines
are not significant to Spell/ST, repetitions can be unnecessarily
reported between a section heading and the text which follows
it, for example.]
Split infinitives:
Report split infinitives and their line number.
Make suggestions:
Use this item to ask Spell/ST to make suggestions for the correct
spelling of unrecognised words. Unrecognised words for which no
plausible suggestions can be made will be listed next in alpha-
betical order. We recommended you use the 'Ignore u/c words'
option (see below) when you use the 'Make suggestions' option;
this could help Spell/ST save time trying to find correct spell-
ings for file names and computerese, for example.
Ignore u/c words:
Use this item to make Spell/ST ignore words containing only
upper-case letters, digits and special characters. This could
help Spell/ST avoid checking non-existent words like file names.
Statistics:
Produce statistical information on the file whose spelling is
being checked.
Frequency counts:
Against each unrecognised word reported add the number of times
it occurred in the document. This is of use only with the 'Alpha-
betical order' option.
Log unknown words:
Cause Spell/ST to add unrecognised words to a file, NEWWORDS.LOG
in the folder from which SPELL.PRG is executed. Note that no
words will be added unless the main, names and computerese dict-
ionaries are selected.
[Unrecognised words are added to the log in alphabetical order
and the name of the source document is not recorded. You can
periodically add valid words from NEWWORDS.LOG to your diction-
aries.]
1.4 The 'Dictionaries' Menu
The 'Dictionaries' menu looks something like the following:
------------
|Dictionaries|
|--------------
| Main | This item will have a tick mark
| Names |
| Computerese |
| Reject |
|--------------|
| User 1 |
| User 2 |
| User 3 |
|--------------|
| All | or 'None'
|--------------|
| Create... |
--------------
A tick indicates Spell/ST's default dictionary.
Main:
Select the dictionary of English words.
Of the three dictionaries supplied with Spell/ST, we recommend
you use only the English dictionary when you first check a docu-
ment.
Names:
Select the dictionary of names and places.
If you select the extensive 'names' dictionary without having
previously scanned your document without it, Spell/ST could fail
to detect a misspelling. For example, 'bangor' could be recog-
nised as valid when you really meant 'banger'.
Computerese:
Select the dictionary of computerese words.
If you select the extensive 'computerese' dictionary without hav-
ing previously scanned your document without it, Spell/ST could
fail to detect a misspelling. For example, 'pascal' could be
recognised as valid when you really meant 'rascal'.
There are many technical words which would not appear in a nor-
mal dictionary. Spell/ST can be instructed to check words against
personal dictionaries supplied by you, besides checking its own.
The following four options may be selected if you wish such dict-
ionaries to be used. Please note that, to use your own
dictionaries, you'll have to construct a new master dictionary
from them with the MakeDict program.
Reject:
Select your dictionary of words to be rejected.
[You may often mistype a word as some other valid word, 'my' as
'mu' or 'trial' as 'trail', for example. To avoid changing the
standard dictionaries to cope with this situation, you may cons-
truct a dictionary containing words which are always to be
treated as unrecognised, regardless of whether they appear in
another dictionary. A reject dictionary could also be used to
cause the rejection of English words which are invalid in
American English; this could be used with a user dictionary (see
below) containing words which are valid in American, but not
English, English.
You could construct Spell/ST dictionaries for another language.
The only restriction is that its alphabet should be reasonably
represented by ASCII.]
User 1:
Select your first dictionary.
User 2:
Select your second dictionary.
User 3:
Select your third dictionary.
All:
Cause all the above dictionaries to be selected, if they are
present.
None:
Cause all the above dictionaries to be de-selected.
[When all dictionaries are de-selected, Spell/ST will recognise
no words in your document. 'None' may be used with the 'Alpha-
betical order' option (see above) to produce an alphabetical list
of all the words in your document.]
The following menu item will be of interest to those who wish to
maintain Spell/ST dictionaries.
[Create:
This menu item is used to instruct Spell/ST to prepare to create
a dictionary. Fill the file-selector with the name of a '.WRD'
file. When you next use 'Scan document', the unrecognised words
in the document will be written to the '.WRD' file you specified.
That file will be suitable for adding to the master dictionary
with MakeDict.
It is a good idea to select all Spell/ST's standard dictionaries
- and all your own as well - when you use 'Create'.
If you selected a 'User 1' dictionary, it will be extended into
the '.WRD' file with the unknown words from your document. Words
added will be marked to the right with '<', aiding the location
and checking of the new words.
Please note there is a limit of about 700 on the number of un-
known words which may added with 'Create'.]
2 USING MAKEDICT.PRG
2.0 Introduction
The MakeDict program is a utility for maintaining a Spell/ST mas-
ter dictionary. It reads some or all of the various '.WRD' files
supplied with Spell/ST - or amended or provided by you - and
makes from them two files, MASTER.DIC and MASTER.IND. The idea is
regularly to add to the '.WRD' files and occasionally to apply
MakeDict to them.
[Those who wish to maintain Spell/ST dictionaries should consult
Appendices A and B. It is often easier to maintain your personal
dictionaries, not those supplied with Spell/ST. Incidentally, we
recommend Tempus 2 for editing large '.WRD' files.]
A complete run of MakeDict takes a long time: about three-and-a-
quarter minutes from hard disc and about ten-and-a-half minutes
from diskette, so it's not something you'll want to do too often.
However, it is worthwhile if you regularly use Spell/ST and wish
its dictionaries accurately to reflect your needs. You'll have to
use MakeDict if you wish Spell/ST to use any of the four personal
dictionaries.
2.1 The 'Desk' Menu
MakeDict's 'Desk' menu looks something like the following:
----
|Desk|
|-------------------
| About MakeDict... |
-------------------
Our copyright notice and MakeDict's version number are displayed
when you click on the 'About MakeDict...' item.
2.2 The 'Dictionaries' Menu
The 'Dictionaries' menu looks something like the following:
------------
|Dictionaries|
|----------------
| Select... |
| Save setup | This item is disabled on first entry...
| Make | ...as is this...
| Information... | ...and this
|----------------|
| Quit MakeDict |
----------------
The disabled items will be activated later.
Select:
When you click on this item, a multiple file-selector is display-
ed which you fill with the names of the '.WRD' files to be
included in your master dictionary; at least one must be speci-
fied and you must supply a path for these. The selector is also
used to locate your master dictionary and its index. Note that
their path may be different from that of the '.WRD' files. Click
on the 'Okay' button when you've made your selection. MakeDict
will complain if you've typed incorrect path names or the names
of non-existent '.WRD' files.
Save setup:
Once you have decided which '.WRD' files to use, click on the
'Save setup' item to save your selections for the next time.
Make:
Click on this item to generate a master dictionary. This can take
some time - see Section 2.0.
Information:
Once the generation of the master dictionary is complete, click
on the 'Information' item to obtain the following statistics on
the dictionaries you have used:
o the time taken to read or write a dictionary
o the number of words each dictionary contains
o the number of 'variants' (see Appendix B) each dictionary
contains
o the number of bytes occupied by each dictionary
Quit MakeDict:
When you're finished, click on this item to return to the desk-
top.
APPENDIX A What is a Word?
Spell/ST usually considers each line of the source file sep-
arately. The exception is when a word appears to be hyphenated
at the end of a line, in which case that line and the following
line are effectively joined and the hyphen removed.
Lines and/or words will be ignored when:
o a line starts with a full stop
o a word is preceded by '\' or '|' (text in round brackets
after such a word is also ignored)
o a letter is preceded by '$'
o the first two characters in a line are '//' or '/*'
The first three avoid the text-formatting commands of GCAL,
LaTeX, PROFF, TeX and Tidy, whilst the last avoids MVS JCL.
Within a line, a word is a sequence of characters satisfying the
following conditions:
o each word is as long as possible
o a word starts with a letter, or a digit followed by a
letter, or the digit 1 followed by a digit followed by a
letter, and may contain letters and digits
o a word may also contain full stops, hyphens, and
apostrophes, provided that each such special character is
preceded and followed by a letter or digit
Note that upper-case letters are effectively converted to lower-
case and words truncated to a maximum of 26 characters. One-
letter words and the two-character sequence 's at the end of a
word are ignored.
APPENDIX B Dictionary Format
Within a Spell/ST dictionary (with a name ending in '.WRD'),
words are ordered first by length (shortest words first), and
then alphabetically, one word per line, with no leading spaces.
All letters are treated as if they were in lower case. A '.WRD'
dictionary may contain comment lines preceded by an exclamation
mark.
If you use Spell/ST's 'create' facility to extend a '.WRD' file,
the comments it contains will not appear in its extension.
Spell/ST's master dictionary, contained in MASTER.DIC and cons-
tructed by MakeDict from the various '.WRD' files, is a binary
file not readable by you. MakeDict issues a warning if the size
of a master dictionary grows to within one per cent of the maxi-
mum allowed, 290,000 bytes, at present. The master dictionary
supplied with Spell/ST has room for about a further 275 eight-
letter words, excluding variants. Versions of Spell/ST and
MakeDict having larger or smaller dictionary capacities are
available on request - see Section 0.5.
Alphabetical order is as follows:
. - ' abcdefghijklmnopqrstuvwxyz 0123456789
The size of the main dictionary has been reduced by a factor of
about 2.5 by attaching affix flags to word stems.
A backslash '\' separates the word from the affix flags. Various
letters after the backslash represent rules for deriving affixes,
as shown in the table which follows these examples:
Word\flags Represents
stare\abc stare, stares, staring, stared;
pose\bcstw pose, posing, posed, unposed, repose,
reposing, reposed, dispose, disposing,
disposed;
divide\abcfhims divide, divides, dividing, divided, divider,
division, dividers, divisions, undivided;
affect\abchlmqs affect, affects, affecting, affected,
affection, affectation, affections,
affectations, unaffected.
The affix flags, listed below, are ordered so that prefix and
suffix flags appear in separate groups arranged in order of freq-
uency of occurrence.
Flag Action/condition Prefix/suffix
a add -s
b replace -ee by -eeing
else replace -e by -ing
else add -ing
c replace -<consonant>y by -ied
else replace -e by -ed
else add -ed
d replace -ic by -ically
else replace -y by -ily
else add -ly
e replace -y by -ies
else replace -f by -ves
else replace -fe by -ves
else add -es
f replace -<consonant>y by -ier
else replace -e by -er
else add -er
g replace -ate by -acy
else replace -ant by -ancy
else replace -ent by -ency
else replace -e by -y
else add -y
h replace -mit by -mission
else replace -ibe by -iption
else replace -ume by -umption
else replace -de by -sion
else replace -e by -ion
else add -ion
i apply rule f, then add -s
j double last letter, add -ed or -ing
k replace -y by -iness
else add -ness
l replace -e by -ation
else replace -y by -ication
else add -ation
m apply rule h, then add -s
n replace -ble by -bility
else replace -acious by -acity
else replace -ous by -ity
else replace -e by -ity
else add -ity
o replace -<consonant>y by -iable
else replace -e by -able
else add -able
p replace -e by -est
else replace -y by -iest
else add -est
q apply rule l, then add -s
r apply rule b, then add -s
s add prefix un- apply rules c,j(-ed) only
t add prefix re- apply rules a-r,x
u add prefix un- apply rules a-r,x
v add prefix in- apply rules a-r,x
w add prefix dis- apply rules a-r,x
x check consistent use of -ise and -ize
APPENDIX C Fatal Error Messages
Spell/ST
"A line is longer than 256 characters."
The document being scanned has at least one line containing
more than 256 characters.
Solution: shorten the offending line(s).
"Can't find <filename>!"
where <filename> is MASTER.DIC, MASTER.IND or SPELL.RSC.
Solution: read Section 0.2.
"Error <letter> at address <address> {message}"
If this occurs, please supply us with full details.
"The master dictionary is too big."
This occurs if MASTER.DIC is larger than 290,000 bytes.
Solution: apply MakeDict to fewer and/or smaller '.WRD' files.
"There are too many unknown words."
Possible solutions: select more dictionaries, use the 'Ignore
u/c words' option, or correct some spelling mistakes.
"Spell doesn't run at low resolution."
Solution: select medium resolution or use a monochrome monitor.
MakeDict
"A dictionary word is too short."
Solution: remove the one-letter word from the offending '.WRD'
file.
"Can't find <filename>!"
where <filename> is MAKEDICT.RSC or MAKEDICT.SET.
Solution: read Section 0.2.
"Dictionary not ordered by size around line <line>."
Solution: relocate the word in the offending '.WRD' file.
"MakeDict doesn't run at low resolution."
Solution: select medium resolution or use a monochrome monitor.
"The master dictionary will be too big."
Solution: select fewer and/or smaller '.WRD' files.
APPENDIX D Background Information
SPELL, implemented on mainframes under MVS and VM by Murray
Langton, doesn't have to worry about how much memory is used. The
dictionaries are held in human-readable form and converted to
internal form each time SPELL is used.
SPELL, implemented for the Atari ST micro-computer by David
Tilley, has been split into two parts: MakeDict takes the human-
readable dictionaries and converts them to internal form (includ-
ing index construction). Spell/ST can then read the converted
dictionaries more quickly. Split infinitive detection has been
added.
The problems with SPELL on a micro-computer are the size of the
dictionaries and indices (some 300 kilobytes) and the time taken
to read them, especially from diskette. We are considering ways
of alleviating these problems.
In the meantime, we offer you a versatile, if rather cumbersome,
package.
APPENDIX E Known Bugs
Spell/ST and MakeDict: window actions are sluggish immediately on
exit. They soon recover, however.
Spell/ST's 'stack' was increased from 4 to 32 kilobytes to allow
for the detection of a very large number of unknown words. This
may prove insufficient for some documents when the message 'There
are too many unknown words' appears.
If you try to use 'Create' to add more than ~700 words, Spell/ST
can crash, after which the machine hangs.
In medium resolution only, if you click on the 'About Spell/ST'
menu item _after_ scanning a document, the resulting form is cor-
rupted and Spell/ST exits. This annoying bug is proving difficult
to track down but, fortunately, the machine doesn't hang.
Please report bugs via e-mail (see Section 0.5).
APPENDIX F Improvements
The following improvements to Spell/ST come to mind:
Reduce size of master dictionary. )
Speed up document scan. ) see Appendix D
Save on memory. )
Provide a Spell/ST desk accessory. No point until the above
improvements are made.
Allow a dot in a MakeDict folder name.
Highlight unknown words in context output instead of using ^.
Improve treatment of end-of-line hyphens.
Provide interactive correction.
*** End of document ***