home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The C Users' Group Library 1994 August
/
wc-cdrom-cusersgrouplibrary-1994-08.iso
/
vol_200
/
226_01
/
manual.txt
< prev
next >
Wrap
Text File
|
1987-12-07
|
22KB
|
382 lines
ART-CEE
USER'S GUIDE
ART-CEE is a generic inference engine. It was written in MIX C on an
Apricot F1, a run-of-the-mill MS-DOS machine. The source and load modules
have been successfully compiled and executed on IBM XT, AT and clone machines
and should port easily to any MS-DOS machine. The source code also should
be fairly easy to convert to other C dialects.
Inference engines are a category of artificial intelligence programs.
Like traditional databases, they structure information for later retrieval.
But they also allow can guide the user through solving problems, point out
unnoticed connections within the data and suggest new possibilities. They
are capable of working with certainties as well as 'fuzzy' information
which is incomplete or tentative.
Information is entered into inferences engines as rules. In ART-CEE's
case, these rules are "If...then" propositions. Such rules should be closely
related and basic to the body of knowledge concerned; the quality of the
system built with the inference engine depends most on the accuracy and ade-
quacy of these rules. The user should enter enough rules to draw the major
connections between the data items, but there is no need to enter everything
that there is to know--after all, inference engines are designed to draw
their own inferences.
Information is retrieved from inference engines through queries. A query
is a question or a command which causes the inference engine to search the
database, extract the information it knows immediately and/or draw the
necessary inferences to come as close as possible to meeting the request, and
display both its conclusions and how it arrived at them. ART-CEE has three
kinds of queries that allow a broad range of services to you.
Each rule in ART-CEE is associated with a percentage of truth, occurrence
or certainty. If the rule is always true its percentage is 100; if it is
never true its percentage is 0. ART-CEE also uses this number of mark some
rules as logically impossible and to prevent the entering of tautologies.
These percentages are used to control the query process.
GETTING STARTED
Before you do anything, make a copy of all ART-CEE files, hide that
copy from neighbors, friends, angry spouses, magnetic fields, dust and
nuclear explosion.
ART-CEE is delivered as four source files. Compile each source file
individually. If you are using the MIX system, you may wish to optimize
WORKUP2.C for speed. Then link the four object files together and name the
output file ART-CEE.COM. The program will run faster if you link the runtime
functions with the WORKUP objects, but this will increase the size of the
final .COM file to above 60K.
If you find that ART-CEE's space requirements exceed what is available,
the amount of 'stack' and 'heap' used can be affected greatly by adjusting
the value of MAX in the header files. ART-CEE is supplied with a value of 60
for that variable. All source files use carry exactly the same value for MAX.
Locate the following files in the same directory on the default drive:
ART-CEE.COM, HELP1.AIH, HELP2.AIH, HELP3.AIH, HELP4.AIH. If you chose not to
link the MIX runtime functions with the WORKUP objects, the MIX C library
functions also must reside in that directory, unless a PATH command has been
executed. The runtime module also must have access to the MS-DOS i-o routines
on the system disk; screen blanking will not succeeed if these routines are
not available on the default drive or through a PATH command.
MAIN MENU AND PROMPT
The main input screen displays a menu of all available input options.
Commands and default settings may be addressed by entering the single
letter in the appropriate menu phrase. For instance, to load a data file,
enter "L". Usually the letter to be entered is the first letter in the
phrase, but in any case it is the only capitalized letter. Do not enter
the full word or phrase; ART-CEE will try to interpret that entry as a rule
or subject. ART-CEE will convert your inputs to all-capitals; the program
will execute more quickly if you enter them as capitals in the first place.
All inputs to ART-CEE must end in the "Return" or "Enter" key. Beginning with
version 1.4, inputs are processed without regard to case.
Rules and queries demand a more involved input. Any input at the main
prompt that is longer than one character will be treated as a rule or query.
RULES
All rules must begin with the word "IF " and contain the word " THEN ".
Between these words must be a subject, and after the " THEN " must be a
predicate (apologies to English teachers who know that such phrases are not
grammatical subjects and predicates). The entire rule cannot be longer than
eighty characters. Neither subject nor predicate should contain punctuation.
It follows that the rule should not end in punctuation (the reason for this is
that later matching against the predicate would have to include the same
punctuation, which probably would be incorrect for that usage and a mess to
remember). Any leading grammatical article ("A ", "AN " or "THE ") in sub-
ject or predicate will be ignored.
ART-CEE searches the database on each rule input to determine if it already
knows the subject or predicate. If it finds that the rule already exists, you
are given the opportunity to enter new percentages of occurrence for that rule.
Otherwise, if there is room in the database for the new rule, the rule is
added to the knowledge base. If override of default percentages is turned on,
you are asked to input forward and reverse percents. Otherwise the defaults
are used.
ART-CEE can handle up to MAX number of different subjects and/or predi-
cates. Internally, subjects and predicates are stored identically, as the
subject of one rule likely will be the predicate of another. The maximum
number of rules that can be handled is MAX * (MAX -1).
A forward percentage of occurrence refers to the percent of time that
the rule is true as entered. If fifty percent of all humans are female, then
the rule "IF HUMAN THEN FEMALE" would have a forward percentage of 50.00000
(trailing zeroes need not be entered). Reverse percentages refer to the
percent of time that the rule is true in reverse format. Using the example
above, if one percent of all females are human, the reverse percentage for
"IF HUMAN THEN FEMALE" would be 1.000000. Percentages must be not less than
zero (zero means "never true") and less than one hundred (one hundred means
"always true"). ART-CEE stores impossible rules with negative percentages
of occurrence, but you cannot enter them at rule entry time (see commands B
and G).
QUERIES
Three query formats are available in ART-CEE: simple query, two-element
query and thinking.
The simple query is a request for all rules concerning just one subject
in the database. It reports only those rules which contain the subject as
subject or predicate; no inferences are drawn. Only positive rules are
reported; if a potential rule has not been entered, or if the rule is marked
as impossible, it is not reported. Simple queries are entered by entering
any of the following phrases, followed by the subject: WHO, WHO IS, WHO IS A,
WHO IS AN, WHO IS THE, WHAT, WHAT IS, WHAT IS A, WHAT IS AN, WHAT IS THE,
DESCRIBE, DESCRIBE A, DESCRIBE AN, or DESCRIBE THE. The subject may be
followed by a question mark, which is ignored. ART-CEE searches the database
for an exact match on the subject. If the subject is found, all forward
rules for that subject are reported to the computer monitor, followed by all
reverse rules.
The two-element query asks is a certain rule is true. Using the above
example, to find out what percentage of all females are human, the input
would be: "IF FEMALE THEN HUMAN?" Note that the input is exactly like
the input for the entering of the rule, except that the rule ends in a ques-
tion mark. All parsing rules applicable for rules are applicable for two-
element queries. If subject and predicate are in the database, the query
begins. If the rule already exists in the database it is reported, and the
query search ends. If the rule does not exist (ie., is marked as having a
zero percentage of occurrence), ART-CEE attempts to find any way possible to
chain together enough inferences to draw a conclusion about the rule. For
instance, suppose that ART-CEE does not know directly how many females are
human, but it does know the following rules:
IF FEMALE THEN MAMMAL 20%
IF MAMMAL THEN HUMAN 3%
ART-CEE will link these two rules together and conclude that the rule "IF
FEMALE THEN HUMAN" is true 0.6% of the time (20% times 3%). It reports
the chain that it used to draw this conclusion on the monitor and asks if
you agree with each step in the chain. If you disagree with any step, the
entire chain beginning with the part that you rejected is abandoned, and
the search continues. If you agree with the entire chain you will be asked
if you wish to add the new fact ("IF FEMALE THEN HUMAN" 0.6%) to the data-
base and complies with your decision. The search resumes to find other ways
that FEMALE and HUMAN can be linked until all possible links have been
examined.
The two-element query also can work with incomplete data. Sometimes it
is possible to make a connection between two subjects only if one additional
rule is assumed to be true. For instance, suppose that the following rules
are known:
IF FEMALE THEN MAMMAL 20%
IF MAMMAL THEN TWO-LEGGED 3%
IF TAILLESS THEN HUMAN 10%
ART-CEE cannot conclude IF FEMALE THEN HUMAN unless one of the following
assumptions is made: FEMALEs are TAILLESS, MAMMALs are TAILLESS, TWO-LEGGED
implies TAILLESS, MAMMALs are HUMAN or TWO-LEGGED implies HUMAN. By setting
the default number of assumptions on the main menu (function A), the number
of assumptions which will be included in each attempt to chain subjects to-
gether is controlled. Changing the number of assumptions to 1 allows 1
such assumption per attempt, etc. The maximum number of assumptions allowed
in any one chain is MAX - 3. Increasing the number of allowed assumptions
increases the power of the two-element query, but is also geometrically in-
creases the effort both program and user must exert to get through the query.
If the number of assumptions is nonzero ART-CEE may request information from the
user at what seems to be odd moments. When the user is asked to agree to the
assumption as drawn, a "Y" response will result in the new rule's addition
to the database, and the user will be asked for the percentage of occurrence
for that rule, whether or not the override of defaults is on.
The final query form, "thinking", is an automated extension of the two-
element query. Every subject in the database is chained to every other sub-
ject in the database. Before the query is executed, ART-CEE goes through the
database marking all logical impossibilities. These impossibilities take
the form of:
IF A THEN B cannot be true (ie., IF A THEN B has a negative percentage)
IF C THEN B is always true (ie., IF C THEN B has a 100% percentage)
Therefore, IF A THEN C can never be true.
Then ART-CEE begins chaining. Only rules with positive percentages are in-
cluded in the chains (ie., no assumptions are drawn). All chains are extended
as far as they will go, up to the think depth setting shown on the main menu.
That is, if the think depth setting is 3, then no chain will be extended be-
yond three subjects. The minimum depth setting is three, and the maximum
setting is MAX - 1. While increasing the depth setting increases the prob-
ability of finding all possible inferences, it also greatly increases the
time necessary to perform the query. Seldom does a depth of more than four
or five prove efficient. All connections drawn in the "thinking" function
are applied to the database, and the greatest percentage found for each rule
is the final one saved in the database. No impossible rules are changed.
COMMANDS
A brief discussion of each rule follows, each prefixed by the single-
character entry that invokes the rule.
A Set number of assumptions that will be allowed in any single chain
in the two-element query. Minimum number is 0; maximum is MAX - 3.
B Enter a file containing mutually exclusive subjects, and mark all
occurrences of those subjects in the database as mutually exclusive.
For instance, assume that a file contains the following facts:
DOG, CAT, PIG, HORSE, COW. Assume also that the database contains
the subjects DOG, CAT, HORSE, COW. This rule will mark the follow-
ing rules as impossible:
IF DOG THEN CAT IF DOG THEN HORSE IF DOG THEN COW
IF CAT THEN DOG IF CAT THEN HORSE IF CAT THEN COW
IF HORSE THEN DOG IF HORSE THEN CAT IF HORSE THEN COW
IF COW THEN DOG IF COW THEN HORSE IF COW THEN CAT
The PIG item in the file is ignored.
C Change a subject without changing any percentages associated with it.
For instance, the subject "PIG" could be changed to "SWINE". Then
all rules that referenced "PIG" will now reference "SWINE".
D Drop a rule. Suppose that the database contains the rule "IF COLLIE
THEN DOG". By selecting this option, the rule can be marked as having
a zero percentage of occurrence. If this is the only rule referencing
COLLIE, then the subject COLLIE is erased. The same would occur to
DOG is this was the only rule referencing DOG.
F Set default forward percentage of occurrence. Valid values for this
default are not less than zero and not greater than one hundred. This
setting can be used to great advantage if a large number of similarly-
occurring rules are to be entered at the same setting. Set the default
appropriately, set the default reverse percentage as well, turn off
the override option, and just enter the rules. All prompts for per-
centage inputs will be skipped.
G Enter a group of mutually exclusive subjects from the keyboard. The
function works the same way as "B", except that the keyboard is the
source of information. A single letter "E" ends the input stream and
begins the marking of subjects. Any number of subjects can be en-
tered, but only those actually found in the database will be stored.
After the subjects are marked, you will be given opportunity to save
the group for later use as an input file under option "B".
H Help screens are available online. These screens are an abbreviated
version of this user's guide. The function will work only if the
help files are located in the default directory on the default drive
or an MS-DOS PATH command was issued prior to entering ART-CEE.
I Initialize the database. This function erases all subjects in the
databased marks all percentages of occurrence as zero except for
tautologies (IF A THEN A), which are marked as impossible.
K Set depth of thinking chaining. See the discussion of thinking under
queries, above.
L Load a data file from disk. The database developed in a previous
ART-CEE session can be reloaded using this function. If the file con-
tains fewer than MAX subjects, the data in the file will overlay the
contents of the database now occupying the positions from which the
file was written and leave the remainder of the database untouched. If
the highest-numbered subject in the file exceeds the current value of
MAX, the file cannot be successfully loaded.
M Toggle the showing of the main menu. If the current setting is "Y"
(show the menu), choosing this function will change the setting to
"N" (do not show the menu). Once the commands and query/rule format
are well-known, significant time can be saved by switching the setting
to "N".
O Toggle the overriding of default percentages of occurrence. If the
current setting is "Y" (yes), choosing this function will change the
setting to "N" (no), and vice versa.
P Print the database. All rules in the database will be copied to the
printer (LPT1), together with their percentages of occurrence. These
rules will be grouped by subject, all forward references first and
then∞ reverse references. Therefore, all rules will be printed
twice. If all subjects are used and all subjects have a completely
filled set of rules, the number of lines printed will be MAX * 2 *
(MAX + 2).
R Set the default percentage used for reverse references. Valid values
are not less than zero and not greater than one hundred.
S Save the database to disk. The database can be saved in whole or
in part. To save just part of the database, enter the starting and
ending positions when prompted (use function V to determine what
subject is in what position). If the present database was loaded
from disk, the filename from which the database was loaded will be
offered to you as a default, otherwise the filename "ART-CEE.DAT" will
be offered. The default can be overridden by entering any other
name when prompted. If a file by the chosen name already exists, it
will be overwritten without backup.
T Think. See discussion under "QUERIES".
V View the database. All used subjects are listed in order of entry,
followed by the number of forward references and number of reverse
references for each.
X Exit the program. You will be asked if you wish to save the database
before the exit occurs. The default is not to save the database.
ART-CEE AND DATA STRUCTURES
Like all software, ART-CEE has to deal with four possible relationships
within the data it handles. The simplest and least significant is the one-
to-one relationship. An item in the database has just one connection with
one other item in the database, and that's all. "IF A then B" defines such
a relationship. If "A" is true then "B" is always true; if it were other-
wise, the rule would be just one part of a larger set of truths, and "A"
would connect with at least one more item besides "B".
The one-to-many relationship involves multiple possibilities rising
from the same item. "If A then B" is true part of the time, and "If A then
C" also is true part of the time, but "B" and "C" cannot be true at the
same time.
The many-to-one relationship is created by such combinations as "If C
then A" and "If B then A". Two items have the same kind of relationship
with another item in the database; they lead to the same conclusion. This
relationship is not the same as "If C then if B then A".
Finally, the many-to-many relationship combines the one-to-many and
many-to-one relationships. The following rules constitute a many-to-many
relationship:
If A then B.
If A then C.
If C then B.
If B then C.
The relationships between items intertwine.
Expressing one-to-one relationships with ART-CEE is straightforward.
Just enter a new rule. However, unless more rules are entered, transforming
the one-to-one into one of the other types, the one-to-one relationship
amounts to little more than a redefinition of the subject.
One-to-many and many-to-one relationships are created by entering
several rules having items in common on the 'many' side of the relationship.
The grouping functions (commands 'B' and 'G') are then used to mark these
common elements as having a mutually exclusive relationship with each
other. The result is a hierarchy of information that can be diagrammed as
a triangle:
A B C
/ \ \ /
/ \ or \ /
B C A
Use of the grouping functions assures that all relationships in the
above diagrams are vertical ("If B then C" and vice versa cannot be
true without transforming the relationships into the many-to-many kind).
This structure is appropriate for classification schemes, diagnosis patterns
and family trees.
Many-to-many relationships form a more nebulous pattern in which each
element theoretically can rise from and lead to every other item. Such
structures can be used to identify patterns in which statistics are known
about the data but overall structure of the data is unclear. Many-to-
many patterns can also be used to interrelate two or more hierarchies.
The big factor with ART-CEE is that ART-CEE just loves to transform hier-
archies (many-to-one and one-to-many relationships) into many-to-many forms.
The drawing of inferences in the way that ART-CEE does this transformation.
If you have build a hierarchical structure and want to keep it that way,
be careful to do the following:
1. Do not use the 'think' function against your permanent database.
2. Do not add query findings to the database.
3. Do not use assumptions.
4. Use the 'grouping' functions completely.
If you wish to break any of the above, be sure to save the database first,
then do not save the database when exiting from ART-CEE. Of course, if you
are working with a nebulous, highly interconnected database to begin with,
feel free to infer to your heart's content.