home *** CD-ROM | disk | FTP | other *** search
- Lancelot documentation for programming the analyzer
- ---------------------------------------------------
-
- Lancelot analyzes languge by interpreting commands given in an
- analysis file. The commands are called actions which have a particular form.
- Each action starts with a word, called a dot-dot command e.g., ..WORD,
- followed by a print message and then followed by one or more elements as
- defined below.
-
- Several sample (and useful) analysis files are included with Lancelot,
- and typical usage should be inferred from reading them. They do not all use
- all of the options of Lancelot, and so are not a substitute for reading this
- documentation.
-
- The commands are given below in alphabetical order. Occasionally
- there are common points between the commands, and they are discussed in a note
- to the reader. The syntactic form of a command is followed by the semantics
- of the command.
-
-
-
- Definitions
-
- Print messages
-
- Print messages contain an arbitrary number of text lines which may or
- may not be printed depending upon where it appears in an action. A print
- message begins on a line that starts with a colon, and continues to the next
- line that starts with a dot or colon. Print messages may contain variables
- whose value will be substituted when the message is printed.
-
- Occasionally you may wish to print nothing in a print message. You
- may accomplish that using the following print message which is called the
- empty print message:
-
- :.
-
- Blank lines before a print message are ignored, while blank lines
- after a print message are included in the print message. This happens
- because print messages do not stop until the next dot or dot-dot command.
-
-
- Variables
-
- Variables are words that start with a sharp sign (#). Lancelot
- defines a number of variables that are set by various dot-dot commands. The
- value that the variable takes on is described below with the dot-dot command
- that sets the variable. You may define your own variables using the ..CALC
- command. These variables may be used in the same manner as Lancelot's
- variables. Lancelot's variables are all uppercase to make them stand out from
- the rest of the print message. We suggest that your variables be given in
- uppercase also. Case is significant in variable names.
-
-
- Numbers
-
- Some of the dot-dot commands have numbers as arguments. When the syntax
- of the dot-dot command is given, the numbers are given names and enclosed in
- angle brackets (<>). For example, <lower_bound> means a number whose value
- is used as a lower bound.
-
-
- Literal words and search words.
-
- Lancelot has two concepts of a word, the literal word and the search
- word. The literal word is a word that must occur literally in the text, and
- is used by the ..MOST, ..SIZE, ..SHOW, and ..WORD commands. Literal words are
- not sensitive to case. Search words are used by phrase matching and
- conditional matching dot-dot commands, and are more powerful (and slower) than
- literal words.
-
- If a letter in a search word is capitalized, that letter matches
- either upper or lower case text letters. A text word is terminated by one of
- the following characters: .?!,;)<space><tab><return><linefeed>. If a search
- word contains a star (*), the star matches as many text characters as
- possible. If a search word contains a percent (%), then that percent matches
- one or more of the following characters: <space><tab><cr><lf>. Percent
- characters are useful only in conditional matching. Examples:
-
- Analysis Matching Text
- -------- -------------
- *ing ing
- ring
- cling
- string
-
- Crawl* crawl
- Crawl {note that C matches both upper and lower case}
- crawled
- crawling
-
- a*c ac
- abc
- abbc
- abbbc
- abbbbc
-
- in%to "in to"
- "in to"
- "in
- to"
- "in
- to"
-
-
-
-
-
- Syntax of a Lancelot Action
- ----------------------------
-
- 1. All actions start with a dot-dot command which determines the action to
- occur. For example: ..WORD means that single words are to be searched
- in the text. ..ALLCOND means that all sentences of the text are to be
- searched to determine if they contain a specified word or sequence of
- words.
-
- 2. The second element of an action is an unconditional print message which is
- always printed (don't forget that :. prints nothing.)
-
- 3. The third element of an action is one or more literal words or search
- words which are to be acted upon. The particular action taken depends
- upon the dot-dot command.
-
- 4. The fourth element is a conditional print message that is printed only if
- the literal or search word(s) were found.
-
- 5. Optionally, most dot-dot commands allow any number of pairs of element 3
- and element 4 to be repeated any number of times.
-
- 6. An action is terminated by another dot-dot command.
-
- For example:
-
- ..allcond {Element 1}
- :Your text is now being searched for three word constructions. {Element 2}
- .Ha* .preference {Element 3}
- :The text says "has(have)...a preference." Say "prefer." {Element 4}
- .Spell* .out {Element 3}
- :The text says "spell(s)...out." Say "explain(s)." {Element 4}
- .Take* .consideration
- :The text says "take(s)....Consideration." Say "consider(s)."
- ..end {Element 6}
-
-
- Lancelot Dot-dot Commands
-
- -------------------------------------------------------------
-
- ..allcond
- :print message (unconditional)
- .word1 .word2 .word3 ...
- :print message (conditional upon finding a phrase containing the above words)
- .optional next element 3
- :optional next element 4
-
- The text is searched for all sentences that contain search word .WORD1
- followed by zero or more words followed by any one of .WORD2, .WORD3, or ....
- If the element consists of just .WORD1, then those sentences that contain
- .WORD1 will be accepted and the conditional print statement will be printed.
- If the element contains .WORD1 and .WORD2 then a sentence is accepted only if
- it contains both .WORD1 and .WORD2 in the respective order. Note that .WORD1
- may be an entire text phrase by using the percent character.
-
- When a sentence is found that is accepted and the conditional print
- message is not empty, not only is the element 4 print message printed but
- also a window of three text lines is displayed to the user. This window
- surrounds the place in the text where the phrase was found.
-
- This action is repeated until all the occurrences in all sentences have
- been found. The next element 3 is then used for the search. Multiple
- elements in a single ..ALLCOND are equivalent to the same elements in multiple
- ..ALLCOND's.
-
- The variable #COUNT is set to the number of times the phrase is found,
- and #LINE is set to the last line in which the phrase is found. Note that
- #LINE is set before the print message is printed, so the print message can
- print the line in which the phrase was found.
-
- -------------------------------------------------------------
-
- ..allphrase
- :print message (unconditional)
- .word1 word2 word3 ... (NOTE: no dots - this is a phrase)
- :print message (conditional upon finding the phrase.)
- .optional next element 3
- :optional next element 4
-
- The text is searched for all sentences that contain the phrase WORD1
- followed by WORD2, followed by WORD3 ... If a sentence contains the
- phrase of element 3 then the sentence is accepted.
-
- When a sentence is found that is accepted and the conditional print
- message is not empty, not only is the element 4 print message printed but
- also a window of three text lines is displayed to the user. This window
- surrounds the place in the text where the phrase was found.
-
- This action is repeated until all phrase occurrences in all sentences
- have been found. The next element 3 is then used for the search. Multiple
- elements in a single ..ALLPHRASE are equivalent to the same elements in
- multiple ..ALLPHRASE's.
-
- The variable #COUNT is set to the number of times the phrase is found,
- and #LINE is set to the last line in which the phrase is found. Note that
- #LINE is set before the print message is printed, so the print message can
- print the line in which the phrase was found.
-
- -------------------------------------------------------------
-
- ..calc
- .#VARNAME1 = expression
- .next element
-
- The given variable is set to the value of the expression. The
- operators are plus (+), minus (-), times(*), and divide (/). There is no
- unary minus, so 0-#VAR must be used. Times and divide have higher precedence
- than plus or minus. This is one of the few commands that has no print
- message.
-
- Example:
- ..count {this will count the words - discussed below}
- :.
- .0 1000 {we want all words counted regardless of length}
- ..calc {now we estimate the number of characters in the text}
- .#CHARACTERS = #COUNT * 6
- ..type
- :Your text contained about #CHARACTERS characters.
-
- -------------------------------------------------------------
-
- ..case
- :print message (unconditional)
- .#VARIABLE
- .<lower_bound> <upper_bound>
- :print message (conditional upon the value of #VARIABLE falling between
- lower_bound and upper_bound.)
- .optional next element 3
- :optional next element 4
-
- Example:
- ..sent {this will count the sentences - discussed below}
- :We are now testing for sentence length
- .Dr .Mr .Ms .Mrs
- ..case
- :.
- #SLENGTH
- .50 100
- :Your longest sentence is too long. It was found in line #LINE.
- .100 1000
- :Your longest sentence is ridiculously long. It was found in line #LINE.
-
- -------------------------------------------------------------
-
- ..clear
-
- ..CLEAR clears the screen. This usually follows a ..PAUSE command
- to give the reader time to read the screen before it is cleared.
-
- -------------------------------------------------------------
-
- ..comment
- :Print message (never printed)
-
- This command is used to include a comment in your analysis file.
-
-
-
- -------------------------------------------------------------
-
-
- ..cond
- :Print message (unconditional)
- .word1 .word2 .word3 ...
- :print message (conditional upon finding a phrase containing the above words)
- .optional next element 3
- :optional next element 4
-
- This is identical to ..ALLCOND except that only the first sentence found
- that contains the matching phrase is displayed to the user.
-
- -------------------------------------------------------------
-
- ..count
- :print message (unconditional)
- .<lower> <upper>
- :Print message
-
- Lancelot's #COUNT variable is set to the number of words in the
- entire text whose lengths fall between lower and upper.
-
- Example:
- ..count
- :Testing for word lengths
- .2 3
- :Your text contains #COUNT words whose length is either 2 or 3
- .10 20
- :Your text has #COUNT long words
-
- -------------------------------------------------------------
-
- ..double
- :print message (unconditional)
- .word1 .word2 ...
- :print message (conditional)
-
- The text is searched for double words, i.e., two sequential words
- which are the same. If a double word is found and is not in the list of
- words (element 3), then the print message is printed. The word "had" should
- probably be included in the list of excluded words because "had had" is proper
- English.
-
- The variable #COUNT is set to the number of double words found,
- and #LINE is set to the last line in which double words are found. Note that
- #LINE is set before the print message is printed, so the print message can
- print the line in which the double word was found.
-
- -------------------------------------------------------------
-
- ..end
-
- The analysis file must end with ..end.
-
- -------------------------------------------------------------
-
- ..histo
- :print message (unconditional)
- .word1 .word2 ...
-
- A histogram of your sentence lengths is printed. Each line gives the
- number of the sentence in the text, the line number of the text in which the
- sentence appears, and the sentence length. The sentence length is shown by a
- bar of asterisks, one asterisk per word in the sentence. If the sentence
- length is longer than 64 words, then a plus sign is shown at the right end of
- the asterisks, indicating that the histogram overflows the right margin.
-
- The list of dot words have the same meaning as in the ..SENT command.
-
- -------------------------------------------------------------
-
- ..most
- :print message (unconditional)
- .word1 .word2 .word3 ... <number>
-
- The most frequently used words of the text are display along with the
- number of occurances of each. <number> determines how many words are
- displayed. The words .WORD1 ... are words which are to be excluded from the
- display. Typically these are "a" "the", etc. and the word processing symbols
- "p" "b" etc. There is no conditional print message (element 4.) Also, at
- least one .word must be present.
-
- -------------------------------------------------------------
-
- ..none
- :print message (conditional)
-
- If the preceeding ..ALLCOND, ..COND, ..DOUBLE, or ..WORD dot-dot command
- didn't find whatever it was looking for, then the print message is printed.
-
- -------------------------------------------------------------
-
- ..pause
-
- The display screen is frozen until the user presses a key.
-
- -------------------------------------------------------------
- ..phrase
- :Print message (unconditional)
- .word1 word2 word3 ...
- :print message (conditional upon finding the phrase)
- .optional next element 3
- :optional next element 4
-
- This is identical to ..ALLPHRASE except that only the first sentence
- found that contains the matching phrase is displayed to the user.
-
-
-
- -------------------------------------------------------------
-
-
- ..sent
- :print message (unconditional)
- .word1 .word2 ...
-
- ..SENT counts sentences and finds the longest sentence. A sentence
- is a group of words ended by a word whose last charcater is one of . ! ?.
- The last word in a sentence is not a one letter word or any one of the
- words given in the list of words (element 3). This list of words will
- frequently include etc, .Dr, .Mr, .Mrs, etc.
-
- #COUNT is set to the number of sentences. #SLENGTH is set to the
- length of the longest sentence. #LINE is set to the line number of the
- longest sentence.
-
- -------------------------------------------------------------
-
- ..show
- :print message (unconditional)
- .word1 .word2 .word3 ...
-
- The number of occurance in the text of the literal words of element
- 3 are displayed to the user. There is no conditional print message (element
- 4.)
-
- -------------------------------------------------------------
-
- ..type
- :print message (unconditional)
-
- ..TYPE is used to unconditionally print a message.
-
- -------------------------------------------------------------
-
- ..word
- :print message (unconditional)
- .word1 .word2 .word3 ... <number> or <number>%
- :print message (conditional)
- .optionally next element 3
- :optionally next element 4
-
- The entire text is searched for the occurance of the literal words of
- element 3. If the number of occurances of .WORD1 added to the number of
- occurance of .WORD2, ... equal or exceed <number> then the conditional message
- is printed. If <number> is followed by % the sum of the element 3 words must
- equal or exceed that percentage of words in the text before the element 4
- print message is displayed.
-
- The variable #COUNT is set to the number of times the word(s) was
- used.
-
-
-
-
- Example:
- ..word
- :.
- .very large 10
- .:Your paper has used the words VERY or LARGE #COUNT times. This is
- a very large number of occurrences.