home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Shareware Overload
/
ShartewareOverload.cdr
/
wp
/
lancelot.zip
/
LANCEANA.DOC
< prev
next >
Wrap
Text File
|
1984-11-04
|
16KB
|
440 lines
Lancelot documentation for programming the analyzer
---------------------------------------------------
Lancelot analyzes languge by interpreting commands given in an
analysis file. The commands are called actions which have a particular form.
Each action starts with a word, called a dot-dot command e.g., ..WORD,
followed by a print message and then followed by one or more elements as
defined below.
Several sample (and useful) analysis files are included with Lancelot,
and typical usage should be inferred from reading them. They do not all use
all of the options of Lancelot, and so are not a substitute for reading this
documentation.
The commands are given below in alphabetical order. Occasionally
there are common points between the commands, and they are discussed in a note
to the reader. The syntactic form of a command is followed by the semantics
of the command.
Definitions
Print messages
Print messages contain an arbitrary number of text lines which may or
may not be printed depending upon where it appears in an action. A print
message begins on a line that starts with a colon, and continues to the next
line that starts with a dot or colon. Print messages may contain variables
whose value will be substituted when the message is printed.
Occasionally you may wish to print nothing in a print message. You
may accomplish that using the following print message which is called the
empty print message:
:.
Blank lines before a print message are ignored, while blank lines
after a print message are included in the print message. This happens
because print messages do not stop until the next dot or dot-dot command.
Variables
Variables are words that start with a sharp sign (#). Lancelot
defines a number of variables that are set by various dot-dot commands. The
value that the variable takes on is described below with the dot-dot command
that sets the variable. You may define your own variables using the ..CALC
command. These variables may be used in the same manner as Lancelot's
variables. Lancelot's variables are all uppercase to make them stand out from
the rest of the print message. We suggest that your variables be given in
uppercase also. Case is significant in variable names.
Numbers
Some of the dot-dot commands have numbers as arguments. When the syntax
of the dot-dot command is given, the numbers are given names and enclosed in
angle brackets (<>). For example, <lower_bound> means a number whose value
is used as a lower bound.
Literal words and search words.
Lancelot has two concepts of a word, the literal word and the search
word. The literal word is a word that must occur literally in the text, and
is used by the ..MOST, ..SIZE, ..SHOW, and ..WORD commands. Literal words are
not sensitive to case. Search words are used by phrase matching and
conditional matching dot-dot commands, and are more powerful (and slower) than
literal words.
If a letter in a search word is capitalized, that letter matches
either upper or lower case text letters. A text word is terminated by one of
the following characters: .?!,;)<space><tab><return><linefeed>. If a search
word contains a star (*), the star matches as many text characters as
possible. If a search word contains a percent (%), then that percent matches
one or more of the following characters: <space><tab><cr><lf>. Percent
characters are useful only in conditional matching. Examples:
Analysis Matching Text
-------- -------------
*ing ing
ring
cling
string
Crawl* crawl
Crawl {note that C matches both upper and lower case}
crawled
crawling
a*c ac
abc
abbc
abbbc
abbbbc
in%to "in to"
"in to"
"in
to"
"in
to"
Syntax of a Lancelot Action
----------------------------
1. All actions start with a dot-dot command which determines the action to
occur. For example: ..WORD means that single words are to be searched
in the text. ..ALLCOND means that all sentences of the text are to be
searched to determine if they contain a specified word or sequence of
words.
2. The second element of an action is an unconditional print message which is
always printed (don't forget that :. prints nothing.)
3. The third element of an action is one or more literal words or search
words which are to be acted upon. The particular action taken depends
upon the dot-dot command.
4. The fourth element is a conditional print message that is printed only if
the literal or search word(s) were found.
5. Optionally, most dot-dot commands allow any number of pairs of element 3
and element 4 to be repeated any number of times.
6. An action is terminated by another dot-dot command.
For example:
..allcond {Element 1}
:Your text is now being searched for three word constructions. {Element 2}
.Ha* .preference {Element 3}
:The text says "has(have)...a preference." Say "prefer." {Element 4}
.Spell* .out {Element 3}
:The text says "spell(s)...out." Say "explain(s)." {Element 4}
.Take* .consideration
:The text says "take(s)....Consideration." Say "consider(s)."
..end {Element 6}
Lancelot Dot-dot Commands
-------------------------------------------------------------
..allcond
:print message (unconditional)
.word1 .word2 .word3 ...
:print message (conditional upon finding a phrase containing the above words)
.optional next element 3
:optional next element 4
The text is searched for all sentences that contain search word .WORD1
followed by zero or more words followed by any one of .WORD2, .WORD3, or ....
If the element consists of just .WORD1, then those sentences that contain
.WORD1 will be accepted and the conditional print statement will be printed.
If the element contains .WORD1 and .WORD2 then a sentence is accepted only if
it contains both .WORD1 and .WORD2 in the respective order. Note that .WORD1
may be an entire text phrase by using the percent character.
When a sentence is found that is accepted and the conditional print
message is not empty, not only is the element 4 print message printed but
also a window of three text lines is displayed to the user. This window
surrounds the place in the text where the phrase was found.
This action is repeated until all the occurrences in all sentences have
been found. The next element 3 is then used for the search. Multiple
elements in a single ..ALLCOND are equivalent to the same elements in multiple
..ALLCOND's.
The variable #COUNT is set to the number of times the phrase is found,
and #LINE is set to the last line in which the phrase is found. Note that
#LINE is set before the print message is printed, so the print message can
print the line in which the phrase was found.
-------------------------------------------------------------
..allphrase
:print message (unconditional)
.word1 word2 word3 ... (NOTE: no dots - this is a phrase)
:print message (conditional upon finding the phrase.)
.optional next element 3
:optional next element 4
The text is searched for all sentences that contain the phrase WORD1
followed by WORD2, followed by WORD3 ... If a sentence contains the
phrase of element 3 then the sentence is accepted.
When a sentence is found that is accepted and the conditional print
message is not empty, not only is the element 4 print message printed but
also a window of three text lines is displayed to the user. This window
surrounds the place in the text where the phrase was found.
This action is repeated until all phrase occurrences in all sentences
have been found. The next element 3 is then used for the search. Multiple
elements in a single ..ALLPHRASE are equivalent to the same elements in
multiple ..ALLPHRASE's.
The variable #COUNT is set to the number of times the phrase is found,
and #LINE is set to the last line in which the phrase is found. Note that
#LINE is set before the print message is printed, so the print message can
print the line in which the phrase was found.
-------------------------------------------------------------
..calc
.#VARNAME1 = expression
.next element
The given variable is set to the value of the expression. The
operators are plus (+), minus (-), times(*), and divide (/). There is no
unary minus, so 0-#VAR must be used. Times and divide have higher precedence
than plus or minus. This is one of the few commands that has no print
message.
Example:
..count {this will count the words - discussed below}
:.
.0 1000 {we want all words counted regardless of length}
..calc {now we estimate the number of characters in the text}
.#CHARACTERS = #COUNT * 6
..type
:Your text contained about #CHARACTERS characters.
-------------------------------------------------------------
..case
:print message (unconditional)
.#VARIABLE
.<lower_bound> <upper_bound>
:print message (conditional upon the value of #VARIABLE falling between
lower_bound and upper_bound.)
.optional next element 3
:optional next element 4
Example:
..sent {this will count the sentences - discussed below}
:We are now testing for sentence length
.Dr .Mr .Ms .Mrs
..case
:.
#SLENGTH
.50 100
:Your longest sentence is too long. It was found in line #LINE.
.100 1000
:Your longest sentence is ridiculously long. It was found in line #LINE.
-------------------------------------------------------------
..clear
..CLEAR clears the screen. This usually follows a ..PAUSE command
to give the reader time to read the screen before it is cleared.
-------------------------------------------------------------
..comment
:Print message (never printed)
This command is used to include a comment in your analysis file.
-------------------------------------------------------------
..cond
:Print message (unconditional)
.word1 .word2 .word3 ...
:print message (conditional upon finding a phrase containing the above words)
.optional next element 3
:optional next element 4
This is identical to ..ALLCOND except that only the first sentence found
that contains the matching phrase is displayed to the user.
-------------------------------------------------------------
..count
:print message (unconditional)
.<lower> <upper>
:Print message
Lancelot's #COUNT variable is set to the number of words in the
entire text whose lengths fall between lower and upper.
Example:
..count
:Testing for word lengths
.2 3
:Your text contains #COUNT words whose length is either 2 or 3
.10 20
:Your text has #COUNT long words
-------------------------------------------------------------
..double
:print message (unconditional)
.word1 .word2 ...
:print message (conditional)
The text is searched for double words, i.e., two sequential words
which are the same. If a double word is found and is not in the list of
words (element 3), then the print message is printed. The word "had" should
probably be included in the list of excluded words because "had had" is proper
English.
The variable #COUNT is set to the number of double words found,
and #LINE is set to the last line in which double words are found. Note that
#LINE is set before the print message is printed, so the print message can
print the line in which the double word was found.
-------------------------------------------------------------
..end
The analysis file must end with ..end.
-------------------------------------------------------------
..histo
:print message (unconditional)
.word1 .word2 ...
A histogram of your sentence lengths is printed. Each line gives the
number of the sentence in the text, the line number of the text in which the
sentence appears, and the sentence length. The sentence length is shown by a
bar of asterisks, one asterisk per word in the sentence. If the sentence
length is longer than 64 words, then a plus sign is shown at the right end of
the asterisks, indicating that the histogram overflows the right margin.
The list of dot words have the same meaning as in the ..SENT command.
-------------------------------------------------------------
..most
:print message (unconditional)
.word1 .word2 .word3 ... <number>
The most frequently used words of the text are display along with the
number of occurances of each. <number> determines how many words are
displayed. The words .WORD1 ... are words which are to be excluded from the
display. Typically these are "a" "the", etc. and the word processing symbols
"p" "b" etc. There is no conditional print message (element 4.) Also, at
least one .word must be present.
-------------------------------------------------------------
..none
:print message (conditional)
If the preceeding ..ALLCOND, ..COND, ..DOUBLE, or ..WORD dot-dot command
didn't find whatever it was looking for, then the print message is printed.
-------------------------------------------------------------
..pause
The display screen is frozen until the user presses a key.
-------------------------------------------------------------
..phrase
:Print message (unconditional)
.word1 word2 word3 ...
:print message (conditional upon finding the phrase)
.optional next element 3
:optional next element 4
This is identical to ..ALLPHRASE except that only the first sentence
found that contains the matching phrase is displayed to the user.
-------------------------------------------------------------
..sent
:print message (unconditional)
.word1 .word2 ...
..SENT counts sentences and finds the longest sentence. A sentence
is a group of words ended by a word whose last charcater is one of . ! ?.
The last word in a sentence is not a one letter word or any one of the
words given in the list of words (element 3). This list of words will
frequently include etc, .Dr, .Mr, .Mrs, etc.
#COUNT is set to the number of sentences. #SLENGTH is set to the
length of the longest sentence. #LINE is set to the line number of the
longest sentence.
-------------------------------------------------------------
..show
:print message (unconditional)
.word1 .word2 .word3 ...
The number of occurance in the text of the literal words of element
3 are displayed to the user. There is no conditional print message (element
4.)
-------------------------------------------------------------
..type
:print message (unconditional)
..TYPE is used to unconditionally print a message.
-------------------------------------------------------------
..word
:print message (unconditional)
.word1 .word2 .word3 ... <number> or <number>%
:print message (conditional)
.optionally next element 3
:optionally next element 4
The entire text is searched for the occurance of the literal words of
element 3. If the number of occurances of .WORD1 added to the number of
occurance of .WORD2, ... equal or exceed <number> then the conditional message
is printed. If <number> is followed by % the sum of the element 3 words must
equal or exceed that percentage of words in the text before the element 4
print message is displayed.
The variable #COUNT is set to the number of times the word(s) was
used.
Example:
..word
:.
.very large 10
.:Your paper has used the words VERY or LARGE #COUNT times. This is
a very large number of occurrences.