Club Amiga de Montreal

home *** CD-ROM | disk | FTP | other *** search

/ Club Amiga de Montreal - CAM / CAM_CD_1.iso / files / 406.lha / OPS5c / ops5c.doc.pp / ops5c.doc

Wrap

Text File | 1990-08-07 | 117KB | 3,037 lines

OPS5c User's Guide Version 1.08a For the Amiga Personal Computer Bernie J. Lofaso, Jr. Table of Contents 1. Introduction.............................................4 1.1 What is OPS5c?........................................4 1.2 OPS5 History..........................................4 1.3 Production Systems Overview...........................5 2. Working Memory...........................................9 2.1 Class Declarations....................................9 2.2 Working Memory Element Structure......................9 2.3 Vector Attributes....................................10 3. Rules...................................................12 3.1 Rule Structure.......................................12 4. The LHS.................................................13 4.1 Condition Elements...................................13 4.2 Variables............................................13 4.3 Quoting..............................................13 4.4 Predicates...........................................14 4.5 Conjuctions..........................................14 4.6 Disjunctions.........................................15 4.7 Negated Condition Elements...........................15 4.8 Element Variables....................................16 5. The RHS.................................................17 5.1 Element Designators..................................17 5.2 RHS Patterns.........................................17 5.3 Pre-defined Functions................................18 5.3.1 genatom.........................................18 5.3.2 litval..........................................18 5.3.3 substr..........................................18 5.3.4 compute.........................................18 5.3.5 accept..........................................19 5.3.6 acceptline......................................19 5.4 RHS Actions..........................................19 5.4.1 make............................................20 5.4.2 remove..........................................20 5.4.3 modify..........................................20 5.4.4 write...........................................21 5.4.5 bind............................................21 5.4.6 cbind...........................................21 5.4.7 call............................................22 5.4.8 halt............................................22 5.4.9 openfile........................................22 5.4.10 closefile......................................22 5.4.11 default........................................22 6. Conflict Resolution.....................................24 7. OPS5 Programming........................................26 8. Top Level Commands......................................29 8.1 include..............................................29 8.2 run..................................................30 8.4 ppwm.................................................30 8.5 wm...................................................30 8.6 cs...................................................31 8.7 strategy.............................................31 8.8 pbreak...............................................31 8.9 dump.................................................31 8.8 size.................................................31 9. The OPS5c System........................................32 9.1 Installing OPS5c.....................................32 9.2 Differences from OPS5................................32 9.3 Compiler Switches....................................34 9.4 Run-Time Switches....................................35 9.5 Interfacing With External Code.......................35 9.5.1 Data Types......................................36 9.5.2 The Result Element..............................37 9.5.3 Interning and String Spaces.....................37 9.5.4 Functions Versus Actions........................39 10. References..............................................40 Appendix A - Dollar Library Description.....................41 Appendix B - String Library Description.....................44 Index.......................................................45 OPS5c User's Guide 1. Introduction 1.1 What is OPS5c? OPS5c is a compiled version of the OPS5 expert system language originally developed at Carnegie-Mellon University. It belongs to a class of languages which are referred to as rule- based (or production-based) expert systems. In such systems, knowledge is expressed as a set of rules (also called productions). These rules are meant to express some deductive statements which may be used as applicable in solving a problem requiring an expert level of performance. If you have never programmed in a rule-based language, expect to spend a little time "getting up to speed". The methodologies used in programming such systems are considerably different from programming in conventional programming languages. This manual should serve as a brief overview of the operation of production systems (PS's) in general and the OPS5c programming language in particular. There are several good texts that you may want to consult in order to get better acquainted with production systems and several are specific to the OPS5 language. 1.2 OPS5 History OPS5 is part of a family of languages developed at Carnegie-Mellon University by Charles Forgy, John McDermott, Allen Newell, and Michael Rychener in the mid-70's. The inference engine used by OPS5 is based on the RETE pattern matching algorithm developed by Charles Forgy. The original systems were written in Lisp. They read in OPS5 source code and produced an intermediate pseudo-code which could then be interpreted by the OPS5 inference engine. Much of the OPS5 language semantics indicate the underlying Lisp language used in coding the first system. Another version of OPS5 was written in a language called BLISS. The BLISS system runs considerably faster than the original interpreted Lisp but had some limitations not present in the Lisp version. More recent versions have been written in Common Lisp which run many times faster than the original Lisp version, partially due to better Lisp coding and probably to some extent due to greater variety and power of the data structures and functions available in Common Lisp. The most recent version of OPS5 from CMU is a system called ParaOps which compiles OPS5 source directly to machine language. While being considerably faster than previous OPS5 versions the ParaOps system is very machine specific as it produces assembly language as its output. The OPS5c system was designed to accept a language as close to the OPS5 language definition as possible. As with the BLISS version of OPS5, some language features were not implemented. OPS5c is, however, approximately 98% true to the language definition and the features missing are not likely to be missed by most OPS5 programmers. Unlike previous systems, OPS5c uses the TREAT pattern matching algorithm developed by Dr. D. P. - 4 - OPS5c User's Guide Miranker. In general, the TREAT algorithm has shown to be the superior algorithm in most cases. The OPS5c compiler takes as input an OPS5 source program and produces code in the C programming language as output. This code is then compiled and linked with a run-time library module to produce an executable of the OPS5 program. The main emphasis of this system, besides OPS5 source code compatibility, is speed. Whenever a speed versus space trade-off was encountered during system design, the option producing the fastest execution was almost always selected (within reason of course). A limited amount of testing on small programs done in December 1988 showed that the OPS5c system showed a 30-120% speed advantage on benchmark programs over the then fastest known OPS5 system. Additional speed and space improvements have been made to the system since that preliminary testing occurred. Additional benefits of OPS5c are that it may be interfaced with other languages (primarily C at the moment). The main disadvantage to OPS5c is that it often produces large executables. It is not recommended for machines with less than 1 Meg. of RAM unless only small rule systems are used. 1.3 Production Systems Overview In describing production systems in general, the descriptions will be of systems which have architectures similar to OPS5. OPS5 terminology will be used as much as possible so that any reference to external materials will be more understandable to the reader. As previously mentioned, production systems consist of a set of rules, each of which consists of two parts. These are the antecedent, more commonly known as the left-hand side (LHS), and the consequent, more commonly known as the right-hand side (RHS). The LHS can be thought of as a series of conditions which when all are true simultaneously, make the rule eligible to "fire". A rule may fire when all its LHS parts, called condition elements (CEs), are true and it has been selected as the correct rule to fire. In firing, the rule instructs the system to perform one or more actions as indicated by the RHS. The determination of when a rule may fire as well as what effect it's actions have when it does fire is dictated by working memory (WM). Working memory is best thought of as a set of facts maintained by OPS5. These facts take the form of discrete records similar to structs in the C language. Each such structure belongs to a 'class'. Each class may have attributes, which act as data slots, defined for it. Declarations of valid classes and their attributes must precede the rules definitions in the OPS5 source file. A typical declaration might be: (literalize person name age height weight) We could then begin creating instances of this class to represent individual facts. Even with only one such class - 5 - OPS5c User's Guide definition we might want to represent a set of facts such as: (person ^name John ^age 23 ^height 69 ^weight 165) (person ^name Mary ^age 18 ^height 62) (person ^name Mark ^age 18) Each such instance is called a working memory element (WME). It is these elements which are matched against the patterns of the LHS of each rule. When a rule is fired, statements in the RHS may add to, remove from, or modify the contents of working memory. In general, a production system executes a cycle of phases. The phases of a PS cycle are MATCH, CONFLICT RESOLUTION, and ACT. The MATCH and ACT phases we have mentioned earlier. They corresponding to matching patterns of the LHS to working memory elements and performing the actions of the RHS when a rule is selected to fire. The intermediate phase, CONFLICT RESOLUTION, determines how a rule is selected amongst many possibilities for firing. After the system has performed MATCH against the LHSs of all rules, there may either be more than one rule which has a LHS match or a rule could have one or more ways of producing a match. The first case should be very obvious. It simply means that the facts supporting more than one rule firing exist. It is up to CONFLICT RESOLUTION to decide which rule will fire on the current cycle. In the second case, we may find that there exists more than one way of matching WMEs to the LHS of one or more rules. As an example, given the three WMEs above, the LHS of a rule such as (person ^age > 20) (person ^age < 20) could be matched in two ways. We can form a pair using the first and second WMEs or we can form a pair using the first and third WMEs. Each combination of valid mapping of WMEs to the CEs of a rule constitutes what is called an instance. OPS5 will calculate all the instances it can, then select one instance, and hence the rule that it corresponds to, to fire on the current cycle. As we shall later see, instances may be created, but are never guaranteed to be selected to be fired. Other more appropriate instances may be created during the MATCH phase or it is possible for instances to be destroyed. At any moment in time, if we were to stop the system and examine all the instances we would refer to this set of instances as the conflict set. The CONFLICT RESOLUTION phase examines the conflict set after completing the matching of WMEs added to the system in the ACT phase of the previous cycle. Note that the MATCH phase performs an incremental matching. It needs only perform matching for newly added WMEs since the result of matching produces instances and all instances produced from previously added WMEs have already been calculated and retained in the conflict set. The cycle that we have just described is initiated by adding WMEs to working memory, thus causing matching with LHSs - 6 - OPS5c User's Guide of the rules in the system. The cycle will continue until there are no more instances in the conflict set, hence no rules can fire, or a rule fires which contains an explicit halt instruction in its RHS. Hopefully in either case the system has arrived at a solution to the problem at hand. It should be noted that in our brief description of CONFLICT RESOLUTION, we indicated that the system selects a specific instance to fire. We speak of instances firing rather than rules for several reasons. First, it is important to know exactly what WMEs match particular CEs. This is because is it common practice for an action in the RHS of a rule to reference values in the WMEs which match the LHS CEs. Thus we must know exactly which elements match which CEs in order to know the appropriate value to use in the actions. Second, we associate an instance with a specific rule, so specifying the instance specifies the rule implicitly. As we shall see later when we describe how CONFLICT RESOLUTION is performed, the set of WMEs that form a particular instance determines its "priority" that is calculated by CONFLICT RESOLUTION. All instances are ordered by priority at the end of the MATCH phase and the instance with highest priority is selected to fire. Additionally, we should address the OPS5c matching algorithm at a slightly higher level of detail. These details apply specifically to the TREAT match algorithm used by OPS5c. The traditional match algorithm for OPS5 is the RETE match. Refer to [Miranker87] or other reference materials for additional details on the TREAT and RETE algorithms. Associated with each CE is a data structure called an alpha memory (amem). Such a structure also exists in the RETE algorithm. As we add new WMEs to working memory, we must first decide which CEs the new WME might be associated with in the process of creating instances. A WME can be associated with some CE for the purpose of creating an instance only if it passes all the constant tests as well as all variable binding tests. Constant tests are those whose values are not variables but either numeric or symbolic constants. Variable tests make comparisons relative to other WMEs bound to CEs in the rule. While the variable binding tests depend on what WMEs are currently bound to the other CEs in a rule, the constant tests are independent of other WMEs. Thus it makes sense to perform these tests only once for each CE/WME combination and if the WME passed the constant tests for the CE, we store the WME in the alpha memory of the CE. In that fashion we some other WME is added to another CE of the rule, we can scan the alpha memory structure of each CE to look for candidates for satisfying variable bindings rather than having to scan all of working memory. Usually a WME will match the constant tests of many CEs and therefore will be stored in many alpha memories. We sometimes refer to the MATCH phase as containing two sub-phases, SELECT and JOIN (these terms come from database terminology). The SELECT sub-phase corresponds to performing constant tests for a newly added WME so it can be added to the appropriate alpha memories. The JOIN sub-phase corresponds to the variable binding tests that are performed in order to try to create instances to be added to the conflict - 7 - OPS5c User's Guide set. As a recap, we may describe the incremental matching as such. When a new WME is matched, we examine all CEs which refer to the same class as the WME being added. If all constant tests of a particular CE are satisfied by the WME, we add the WME to the alpha memory of the CE. At this point we may stop and look for correct variable bindings in the rule that the CE participates in. We do this by scanning all the alpha memories of the other CEs to find consistent variable bindings. Note that if we find consistent bindings for all CEs of the rule, we create an instance. Also note that the instance MUST contain the new WME bound to the CE that we have just added the WME to. Actually we add to the alpha memory of a CE, not the CE, but this is mearly terminology and we often use the terms alpha memory, amem, and CE to mean the same thing - namely a storage structure for WMEs which satisfy the constant tests of the CE. - 8 - OPS5c User's Guide 2. Working Memory 2.1 Class Declarations As mentioned earlier, classes and the attributes they use are usually declared before being used in any rules. It is usually best to declare all classes and specify all attributes to be used with each class although neither of these practices are enforced by OPS5. OPS5 will allow the user to reference class names that have not been defined. It will also allow attributes which were defined in one class to be referenced in another class. This is due to the simplistic symbol table kept by OPS5 systems. No record is kept of which classes declared a particular attribute name. Unfortunately this "feature" is one that has been exploited by OPS5 programmers in the past and thus is handled similarly by OPS5c to maintain compatibility. Classes and their attributes are declared using the LITERALIZE statement. As seen in an earlier example, the keyword LITERALIZE is followed by a class name which may be followed by zero or more attribute names. The entire declara- tion must be enclosed in parentheses. It is acceptable to use the same attribute name in different classes but a class name may appear in only one LITERALIZE statement. 2.2 Working Memory Element Structure A working memory element (WME) consists of a time tag and an array of attributes. The time tag is a unique number assigned to each WME at the time of its creation. The more recent the WME the larger its time tag will be. The time tag is used not only as a unique identifier, but is also the key feature in the algorithms used by CONFLICT RESOLUTION. In most OPS5 implementations there are 127 attribute "slots" defined in each WME. Regardless of the number of attributes defined for a class, each WME contains room for 127 attribute values. When the declarations section of an OPS5 program has been parsed by an OPS5 interpreter or compiler, it assigns an offset in this array to each of the attribute names that have been declared. The offset assigned to a particular attribute is the same regardless of the class (or classes) the attribute has been declared in. The algorithm must consider which classes the attribute has been declared in so that it can assign a single offset to an attribute name while not assigning the same offset to attributes which appear in the same class. Because of this it is not valid to assume that the order of attributes in a class declaration has anything to do with the assignment of offsets to those attributes. Assignments may be in a different order and may not be sequential. While most classes will have offsets not used by any attributes declared in that class, the unused offsets may have assigned offsets that are either higher numbered or lower numbered. If a programmer chooses to use attribute names in classes for which they were not declared when writing rules, there is the risk that the offset assigned to two - 9 - OPS5c User's Guide or more attributes used in the same class will be the same. This will usually cause over-writing of data and incorrect program behavior. This is the main reason that declaration of all attributes to be used in a class is a recommended practice. OPS5 also has a mechanism for allowing the programmer to control the offset assignments to certain attributes. This is accomplished with the LITERAL statement. This statement simply assigns offsets to attributes blindly. An example LITERAL declaration follows. (literal name = 2 age = 4 address = 45) Attribute names appearing in a LITERAL statement do not have to appear in any LITERALIZE, though they may. Since LITERAL state- ments are processed before LITERALIZEs (although they may appear intermixed in the source code), the system may find that it cannot provide assignments for LITERALIZEs which allow no duplicate offsets in a class, in which case the system will abort with an error message. Attributes in a WME are similar to Lisp atoms. For those not familiar with Lisp, an atom is a structure with a field indicating the atom type (symbol, integer, float, etc.) and the data value. Because the type is associated with the data value, not with the attribute name or location, we may assign a value with one data type (say an integer) to some attribute, then reassign another value with a different data type (say a string). Certain functions in the OPS5 language, namely calcu- lations and relational comparison predicates, are the only parts of the language that operate on specific types. Attribute types will also be a concern to the programmer who wishes to interface OPS5c programs to external C code. It should be noted that for comparison purposes symbols (strings) must be identical and are case sensitive, i.e. 'dog' is not equal to 'DOG'. Normally OPS5 will consider an integer and floating point value to be equal if their difference, when the integer is converted to float, is zero. In OPS5c numeric attributes must be the same type to be equal regardless of their values. By defining the preprocessor symbol COERCE while compiling the C output of the OPS5c compiler, the system will perform coercion of incompatible data types whenever possible. Whenever a WME is created and assigned values, the attribute positions which are not assigned values explicitly are given a value of "nil" (a symbolic atom). Rules may test for unassigned attribute values by comparing to "nil" in a CE in their LHS. 2.3 Vector Attributes Normally we associate a single value with each attribute. There are cases where we wish to store a set of values that are associated with a single attribute name. OPS5 provides methods for finding the offset of an attribute and referencing attribute slots a certain relative distance past the offset. Thus we can - 10 - OPS5c User's Guide say that the set of values associated with some attribute is all values with an offset equal to or greater than the attribute's offset. In order to may sure that we don't over-write other attribute slots, we may ensure that there are no other attributes (at least in a specific class) that has a greater offset. This is done by declaring the "multi-valued" attribute in the class (or classes) it will be used and additionally declaring it as a vector attribute. One or more attributes may be declared as vector attributes using one or more VECTOR- ATTRIBUTE statements. An example would be: (vector-attribute classmates teachers) It should be noted that only one vector attribute is allowed per class and any attributes declared to be vector attributes will be treated as vector attributes in all classes that they are declared in. The user should be aware that the OPS5c implementation normally will not treat all WMEs as being of equal length. Instead it sizes each class according to the maximum offset of the attributes declared in that class. If the class declaration contains an attribute that has been declared to be a vector attribute, then the class will be sized with the maximum number of attributes that a WME can contain (128 in this implementation). There are ways, however, of forcing all WMEs or WMEs of certain classes to be of specific lengths. See the discussion of the '-w' compiler or run-time switch. This action might be desirable (in fact necessary for correct program execution) if the program being run references attributes in a class for which the attribute was not declared. This is a bad programming practice but the above mentioned override mechanism can compensate. - 11 - OPS5c User's Guide 3. Rules 3.1 Rule Structure Below is a template for the syntactic structure of an OPS5 rule: (p rule-name (class-a ^attr-1 value-1 ^attr-2 value-2 ...) (class-b ^attr-x value-x ^attr-y value-y ...) ... --> (action-a parameters-for-a) (action-b parameters-for-b) ... ) The indentation is optional but is used to emphasize the different parts of a rule. A more terse template for a rule might be: (p rule-name LHS-CEs --> RHS-actions) Rule names may be any string which does not contain any blanks. - 12 - OPS5c User's Guide 4. The LHS 4.1 Condition Elements As previously mentioned, the LHS of a rule is composed of one or more condition elements (CEs). A CE is a set of attribute-value pairs. This is usually an attribute name followed immediately by a value. In OPS5 an attribute name in a CE is distinguished from a value by prefacing a string (representing a declared attribute name) with a caret ('^'). Spaces between the caret and the attribute name are allowed. If a value is not immediately preceded by an attribute name, then the value pertains to the attribute slot immediately following the previously referenced attribute slot. If there are no previously referenced attributes then the value pertains to the first WME attribute, which is always the class name of the WME. Another alternate method of specifying an attribute position is to follow the caret with a numeric value. The numeric value indicates an attribute offset for the WME. 4.2 Variables Instead of specifying an attribute value after an attribute name, an attribute variable may be specified. An attribute value is indicated by enclosing a string in angled brackets. The string may not contain any blanks. An example of an attribute variable is <age>. The first appearance of a variable tells the system to create a variable binding by assigning the attribute value of whatever WME is bound to that CE during the matching process to the indicated variable. Subsequent occurrences of the variable in the rule will substitute the variable value literally for the variable occurrence. The variable may appear in subsequent CEs or in actions in the right hand side. 4.3 Quoting Quoting symbols is a method of using characters that are not normally allowed in symbols or have some other meaning to the system. For example, normally a string with one or more imbedded spaces would be interpreted by the system as multiple symbols rather than a single symbol. Perhaps we wish to include the caret character as the first character of a symbol. Normally the system would interpret any symbol starting with a caret to be an attribute name rather than a value. One method of quoting which is supported both in rules and in the entering of commands from the top level (see section 8) is the use of quote characters to surround the string to be treated as a single symbol. The three quote characters supported are the single quote, the double quote, and the vertical bar. Another method of quoting which is only supported - 13 - OPS5c User's Guide in CEs is by using the OPS5 quote operator which is two consecutive slashes. Using this method, a space must separate the quote operator and the symbol to be quoted. Some examples of quoted symbols are: "Hello world!" '"hi," she said' |That's correct| // ^dog 4.4 Predicates Predicates are relational tests that the user may want performed between the attributes of WMEs and the CE pattern during matching. Predicates are specified by placing the predicate preceding the test value where one would normally specify a value to match against. For example, if we want to test for children under the age of twelve, we might specify '^age < 12' as our attribute-value pair. We may also use variables instead of constants to test the attribute against, thus we could have written '^age < <my-age>'. Seven predicates are defined, equal '=', not equal '<>', less than '<', greater than '>', less than or equal '<=', greater than or equal '>=', and same '<=>'. The same predicate test to see if the attribute and test value have the same type, i.e. if both are numeric or both are symbolic. If the variable has been previously bound, i.e. it either appears in some earlier CE or appears earlier in the current CE, then in the absence of a predicate, equivalence testing is performed with the variable binding rather than creating a new variable binding. 4.5 Conjuctions We use braces ('{','}') when we wish to indicate that there are several tests that we wish a single attribute value to hold true for. For example if we are looking for an age group we might specify '^age {> 20 < 25}'. If we also wish to bind the attribute value to an unbound variable, we should include that variable without specifying a test predicate. Additionally, of course, the variable should not have appeared in a previous CE of that rule (i.e. the variable is unbound). For example: '^age {<her-age> > 20 < 25}'. Another use for braces is as a place holder in a pattern. The pattern (cat {} black) will match WMEs of class 'cat' whose second attribute may be anything and whose third attribute is 'black'. Note that when we say second or third attribute we are referring to the offset numbers assigned to attributes. This may or may not have anything to do with the order that attributes appear in the literalize statements. - 14 - OPS5c User's Guide 4.6 Disjunctions Disjunctions allow the matching of one of a selection of choices. Disjunctions are indicated by enclosing a set of choices in double angled brackets ('<<','>>'). An example would be the CE (cat << black tan yellow >>) which would match all WMEs of class 'cat' whose second attribute is either 'black' or 'tan' or 'yellow'. It should be noted that there is an implicit quoting mechanism in the use of disjunctions. A CE such as (cat << black tan <x> >>) would specify the literal '<x>' as an alternate value for the second attribute rather than match the current binding of the variable x. Hence, variables are not allowed to be bound or tested in a disjunction. 4.7 Negated Condition Elements Condition elements may be negated by placing a minus sign before the rest of the CE. This has the meaning that in order for a rule to fire, we must be able to find a WME that matches each of the positive CEs and there must not exist any WMEs that match the negated CEs. As with positive CEs, negated CEs may contain tests against previously bound variables. In order for an instance to be created, there must exist a WME for each positive CE which satisfies all tests whether they are constants tests or variable bindings that test values between different CEs. When negated CEs are present in a rule we have the additional restriction that for any potential instance which might be created when we consider only positive CEs, we must also not be able to find any WMEs matching the negated CEs. Any WMEs which match a negated CE (for a particular instance) is said to "block" the creation of that instance for the rule. Since negated CEs may contain variable bindings, we must speak of WMEs that match negated CEs with respect to the instance. The instance (considering positive CEs only) defines the variable bindings which can decide which WMEs do or don't match negated CEs. This also implies that variables cannot be bound in negated CEs which might be intuitively expected since the meaning of a negated CE is that something of this pattern couldn't be found. If all WMEs which block an instance are removed from working memory, and those WMEs which match the positive elements remain, then one or more instances will be formed once the last blocking WME has been deleted. (Remember that this does not mean that the instances from this rule will necessarily fire, it just means that its instances are added to the conflict set and are allowed to compete in the selection of the next instance to fire.) Similarly, if a WME is added to working memory and it matches a negated CE for some instance which was previously added to the conflict set but not yet fired, that instance will be removed from the conflict set. Should the blocking WME later - 15 - OPS5c User's Guide be removed and all the WMEs matching positive CEs remain, the instance will be recreated and added to the conflict set. This is the only set of circumstances which allows a unique instance to be created more than once (i.e. it must be blocked between creations). This may also occur even if the instance had been fired. The removal of the blocking WME might still have resulted in a second (or subsequent) recreation of the instance. This is usually not a concern but in some cases the programmer should take into account that such a situation can occur. For convenience, we will use the abbreviations +CE and -CE to stand for positive condition element and negative condition element, respectively. 4.8 Element Variables Just as we might wish to bind an attribute value to a variable for use in RHS actions, we may also wish to specify that some action be taken with respect to the WME matching a particular CE of the rule. We may designate such a WME in one of two ways. The first is simply a reference number, between 1 and the maximum number of CEs per rule (64 in the current implementation), which specifies the CEs appearence position in the LHS. Another method is by associating a name with certain CEs that we may wish to reference. Such a name is called an element variable. This is the preferred method of referencing WMEs from the RHS since deletion and addition of new CEs in the rule can cause position shifts which would affect the numerical element designator, but not an element variable. We specify an element variable by placing a variable name (a symbol enclosed in angled brackets) either before the CE or after the CE and enclosing both the variable and CE in braces. Below are examples of element variables in the LHS. { <THE-CAR> (car ^manufacturer buick ^make <m-name>) } { (person ^hobby tennis) <THE-PLAYER> } We shall see how to use such references when we discuss those RHS actions that operate on WME references. - 16 - OPS5c User's Guide 5. The RHS 5.1 Element Designators Some RHS actions require an element designator to specify a particular WME of the instance being fired that the action is to be performed on. There are two ways of making this specification - an element variable, or a numeric offset. The element variable specification is simply the element variable name (with enclosing angled brackets). The numeric offset specification indicates the number of the WME appearing in the instance. Note that these numbers start at one and -CEs are not counted. Thus if a LHS has a +CE followed by a -CE followed by another +CE, the numeric element designator for the last CE would be two. 5.2 RHS Patterns Patterns in the RHS are similar in form to those of the LHS but have a somewhat different meaning. RHS patterns consist of constants, variables, attribute names, the quote operator ('//'), and function calls. Unlike LHS patterns, which are evaluated at compile time, the evaluation of RHS patterns occurs at run time. The evaluation of RHS patterns has the effect of altering a global buffer area called the result element. The result element (RE) is simply an array of attributes. One can think of it as a global WME which is not actually part of working memory. Associated with the RE is a pointer which keeps track of the current position in the buffer. Terms in the RHS pattern which evaluate to attribute references (names and attribute offsets) have the effect of moving the RE pointer, while terms which evaluate to values are placed in the locations indicated by the RE pointer. When multiple values are placed in the RE without an intervening attribute name or offset reference, the values are placed in consecutive locations starting at the location indicated by the RE pointer. Each time a different pattern is evaluated, the result element is filled with the atom special 'nil' prior to evaluation. As we shall see later, the RE is used not only for pattern evaluation, but for argument passing and value reception with external user- defined functions and actions. All RHS actions consist of an action keyword, followed by a pattern. The pattern is evaluated and the resulting list of instantiated values is used as arguments for the action. Pattern instantiation is quite similar to that in the LHS with the effects of constants, attribute names, and the quote operator being identical. With the exception of the BIND and CBIND actions, variables in RHS patterns are not bound, but are mearly instantiated. Functions do not exist in LHS patterns. In RHS patterns they are indicated by enclosing a list of one or more terms in parentheses. The term following the opening parenthesis must be a constant specifying the function name to - 17 - OPS5c User's Guide be called. The remainder of the list is evaluated and passed to the function as arguments. Function arguments may only be constants or variables, thus no nesting of function calls may be used. Refer to section 5.3 for a discussion of predefined functions and to section 9.5 for a discussion of user-defined functions. Another difference between LHS patterns and RHS patterns is the use of variable attribute names. The term '^<aname>' appearing in the RHS would be interpreted as the attribute name or offset number specified in the variable 'aname'. This syntax is not allowed in LHS patterns. 5.3 Pre-defined Functions All RHS functions return one or more values at consecutive locations in the result element. The values returned start at the current position in the result element. 5.3.1 genatom The genatom function is used to generate a unique symbolic value. This value is placed at the current position in the result element. 5.3.2 litval This function takes a single symbol or variable which represents an attribute name. It returns the offset assigned to it in all WMEs. If the symbol or variable value is not an attribute name, then the function returns zero. 5.3.3 substr This function extracts a sequence of one or more values from a WME and places them in sequential locations in the RE starting at the current position. The first argument to the function is the element designator which indicates the WME to extract from. The second and third arguments indicate attribute offsets which specify the range of attribute values to copy to the RE. Either of these two arguments may be specified as an integer, an attribute name (which is converted to its assigned offset), or a variable that is bound to an integer or attribute name. The order of the range specification arguments does not matter. 5.3.4 compute This function evaluates arithmetic expressions. Expres- sions may contain the operators '+', '-', '*', '//', and '\\'. The latter two are used for division and modulus. Operands may - 18 - OPS5c User's Guide be numeric constants or variables bound to numeric values. Note that compute has no operator precedence and operators are evaluated from right to left (not left to right as one might expect). This may be overriden by the use of parentheses. 5.3.5 accept This function reads input from some input stream. An input stream is usually keyboard input typed by the user, or a file. If called without arguments, accept reads input from the current default input stream. A single optional argument may be supplied which indicates an alternate input stream to read from. The optional argument must be either a symbol or a variable bound to a symbolic value. This string indicates the file handle of the input stream (see section 5.4 for descriptions of accessing file streams using RHS actions or top level commands). If the input begins with an open parenthesis, then a list of values will be read and placed at sequential locations in the RE starting at the current position. Input is read until a closing parenthesis is found. If the first character of the input is not an open parenthesis, then a single input value is read into the RE. If the input function tries to read past the end of a file, then the atom 'end-of-file' is the value returned. 5.3.6 acceptline This function also reads input from a stream, but differs from the accept function in that it reads exactly one line of input, strips out all parentheses, and then places the values read into the RE. Parameters are optional. If any parameters are supplied, they are used to specify the input stream and/or affect the behavior of the function when there are no values in the line input or an end of file condition exists. If the first parameter evaluates to a symbol associated with an input stream, then that parameter acts as a specification of the input stream to read from. If the first parameter is not associated with an input stream, then it is treated the same as subsequent parameters. The remaining parameters, if present, will be placed in the RE if an empty input line is read. 5.4 RHS Actions The actions supported in the RHS for OPS5c are 'make', 'remove', 'modify', 'write', 'bind', 'cbind', 'call', 'halt', 'openfile', 'closefile', and 'default'. These are the same actions supported by OPS5 except that OPS5 also supports a 'build' command that may be used to dynamically add rules to the system at run-time. This is not supported in the OPS5c compiled environment. - 19 - OPS5c User's Guide 5.4.1 make The make command is used to create new WMEs to be added to WM during the MATCH phase of the next cycle. The pattern following the make keyword is instantiated and a new WME is created. Minimally a class name must be given and optional attribute-value pairs or simply values may be supplied. For example, the action: (make person ^name Cindy ^age 25) creates a new WME for the class person with the value 'Cindy' in the attribute slot for 'name' and the value 25 in the attribute slot for age. Similarly to patterns appearing in the LHS, attribute names may be omitted and values are assigned to sequential slot offsets after the last referenced offset. 5.4.2 remove The remove action indicates one or more WME elements to remove. These WMEs are indicates by a list of element designators, which as previously mentioned can consist of either a numeric CE offset (starting at 1) or an element variable name (including the angled brackets surrounding the variable name). It is not an error to remove the same CE twice, but removals after the first are ignored. 5.4.3 modify The modify action can be thought of as a make followed by a remove. The first parameter to modify must be a single element designator. This copies the WME specified to a temporary work buffer where the remaining pattern elements are interpreted as attribute-value pairs that alter the buffer before executing a make command using the buffer values. Once a WME has been modified, the old WME is removed. Since multiple removes are allowed without ill side effects, and removes are not actually processed until the RHS actions have been executed, it is possible to have multiple modifications of a single WME create several new WMEs. For example, a rule such as: (p split-path { <SEGMENT> (segment ^start <s> ^end <e>) } (new-point ^value <p>) --> (modify <SEGMENT> ^end <p>) (modify <SEGMENT> ^start <p>) ) could be used to make two line segments from one by adding a new point in between the lines current start and end points. Either - 20 - OPS5c User's Guide modify command will operate using the original contents of the WME bound to <SEGMENT> and the <SEGMENT> WME will be removed at the end of the ACT phase. 5.4.4 write The write function evaluates a pattern and writes this output to an output stream. If the first argument to write is a symbol (or variable with a symbolic value) that is associated with an output file handle, then the first argument acts as an output stream specifier, otherwise the first argument is treated the same as the rest of the arguments. Each evaluated argument is written to the output stream. There are three special "functions" which can modify the action of write. These are 'tabto', 'rjust', and 'crlf'. The tabto function takes one argument which must evaluate to a numeric quantity. This quantity is the column number that the output cursor is advanced to by printing space characters. If the current output line is already past the column number specified, then a newline character is sent first. The rjust function also takes a single numeric parameter. This indicates a field width. The value following the rjust function will be printed, right justified, in a field whose width is x characters, where x is the value of the numeric parameter. The crlf function takes no arguments. It sends a newline character to the output stream. These three special functions are not really functions at all. They only have meaning with the write action and are referred to as functions because they follow function call syntax. Example: (write (crlf) "The age of" <name> "is" <age> (crlf)) It should be noted that printing of subsequent values always has a space between the values. Thus (write "hello" !) will produce the output 'hello !', not 'hello!'. 5.4.5 bind The bind action is used to bind a functions return value to a variable. Example: (bind <x> (compute <x> * 2)) 5.4.6 cbind The cbind action is used to bind an element variable to the last element that was added to working memory. The addition is usually due to a previous make or modify action, though WMEs may be added by calling external functions. The cbind action takes a single argument, the variable to be bound to. - 21 - OPS5c User's Guide 5.4.7 call The call action is used to invole a user-defined action. The first parameter to call is the name of the user routine. The remaining parameters are pattern elements which are evaluated before being passed as parameters to the routine. Section 9.5 and the Appendicies describe interfacing with user- written C routines and library routines that may be called from these routines. Names of user-defined routines should be declared (with class declarations) by using the external declaration. Example: /* In declaration section */ (external DrawLine) ... /* Some rule RHS */ (call DrawLine <x> <y>) 5.4.8 halt The halt action terminates program execution irregardless of the state of working memory or the conflict set. 5.4.9 openfile The openfile action opens a file for reading or writing and assigns a file handle to a symbol for future reference to that file. The first parameter supplied to openfile is the name to be used as a file handle in subsequent references. The second parameter is the file name, usually in quotation marks. The third and final parameter is one of two keywords, either 'in' or 'out'. These indicate if the file is being opened in read or write mode, respectively. 5.4.10 closefile The closefile action is used to close files previously opened with the openfile action. It takes the file's handle as a single argument. 5.4.11 default The default action determines where various output streams created by OPS5c are directed or where the input stream is read from. The three streams used by OPS5c are 'trace', 'write', and 'accept'. The trace stream is where any trace information, such as which rule fired on each cycle, is directed. Various levels of trace information, including no trace information, can be provided as the system executes. See the description of the top level 'watch' command in section 8 for information on trace levels. The write stream is where output from the write action is directed. The accept stream is the input stream where the - 22 - OPS5c User's Guide accept and acceptline functions read input from. One of the more common uses of default is to redirect trace information to a file. The default action takes two parameters, a file handle and a stream designator which must be one of 'trace', 'write', or 'accept'. Example: (openfile trace-output "foobar.trace" out) (default trace-output trace) /* run program */ (closefile trace-output) - 23 - OPS5c User's Guide 6. Conflict Resolution The purpose of conflict resolution is to choose one instantiation from the conflict set which is to fire on the current cycle. There are many criterion which may drive conflict resolution in a system like OPS5, but the two major driving forces in OPS5 conflict resolution strategies are recency and specificity. These are discussed in more detail below. The conflict set may be viewed as a list of possible assertions that may be made on a particular cycle. A single instance can be thought of as a set of facts and the justification (rule) which allows new facts to be asserted (the RHS actions). Once a particular instance has been selected for firing by conflict resolution, we must remove that instance from the conflict set, otherwise it may fire again and cause a condition known as trivial cycling where the same rule fires repeatedly. When a new assertion has been made, say adding a new WME to the working memory, it is frequently the case that this new element will be the basis for further inferences. Most systems will have a large search space of possible solutions to the problem being solved and the most common system behavior is to find one of many possible solutions rather than all solutions. this implies that in scanning parts of the solution space, we wish to traverse the tree of selection possibilities in a depth- first fashion as opposed to a breadth-first fashion. This is done by assuming that given two facts to reason on, the most recent fact is probably the better one to base assertions on. This heuristic is known as recency and is a strong driving force in the OPS5 conflict resolution strategies. To support recency, each WME in the system is assigned a time tag. This is a unique integer number given to the element at the time of its creation. Part of the conflict resolution strategy is to look at the time tags of all the WMEs that make up each instance, and order the instances according to the recency of the WME time tags associated with it. Another heuristic used in conflict resolution is that of specificity. Specificity assumes that if two rules are very similar, but one either contains more condition elements in the LHS or more tests for the same number of condition elements in the LHS, then the rule with more CEs/tests must be a special case of the other, more general, rule. Specificity demands that we execute the rule with more CEs/tests first. There are two conflict resolution strategies in OPS5, LEX and MEA. They are fairly simple, with MEA being a modification of LEX. We shall describe the LEX strategy first. The criteria for comparing two instances is used to order the conflict set. First, the time tags of the positive CEs are ordered in descending order. The first time tag in each series is compared. The instance with the greater tag is the more recent and is thus preferred over the other one. If the time tags are the same, then the second tags in each list are compared. This continues until a difference in the lists is noted. If the - 24 - OPS5c User's Guide lists are of unequal length, then the longer list is preferred, provided the pair-wise comparisons of tags do not favor the shorter list before its elements are exhausted. This provides for selection of the more specific rule (the one with more CEs has the longer list). If two lists are the same length and have the same elements, then the number of tests (both constant tests and variable binding tests) are compared and the rule containing the most tests is selected. This also favors the more specific rule. If the number of tests are equal then an arbitrary selection if made. The MEA strategy has the same rules as the LEX strategy with one change. We first compare the time tags associated with the first CE of each instance's rule. If that comparison is equivalent, then we order the remaining time tags and perform the pair-wise comparisons. This modification exists because the first CE of a rule is often a context clause which can define when a rule is eligible to fire. More will be said about context clauses in section 7. - 25 - OPS5c User's Guide 7. OPS5 Programming This section is not intended to teach you all the tricks of programming in OPS5. There are books devoted to this subject. Instead, this section is intended to give you an overview of the type of structures you can expect to see in typical OPS5 programs and some of the issues that should be of concern when you begin writing your own OPS5 programs. The novice programmer will find this section important to understanding rule-based programming in OPS5 while those who have some knowledge of OPS5 will probably already know most of the points outlined here. By its nature, rule based programming is data driven. Input data in the form of assertions active certain rules, which may make new assertions activating different rules and so on. A program execution is a constant cycle of modifying facts and making and retracting assertions via rule firing. This is quite a bit different from the declarative programming languages that most of use use such as C, Fortran, Pascal, etc. In conventional languages we specify an algorithm or a sequence of steps to perform. We know that A will be done before B because we have written the program such that the function that performs A will be called before the function that performs B. In contrast, in a rule based system we make statements that when certain facts hold true, it is appropriate to assert that the addition or deletion of facts in the database are appropriate should a given rule be selected to fire. Usually we don't know when a rule will fire, the number of times it should fire, or even if it will fire. If our logic manipulating the facts is correct then we shouldn't care. While this seems like a desirable trait, there are pitfalls. First, the methods that we use to solve a problem can almost never be totally data driven. Each method we use can be broken up into parts or sections which solve subtasks of the larger problem. Many times it is necessary to solve one subtask before considering the solutions to other subtasks. For an efficiency standpoint it is not reasonable to incur the execution overhead involved in maintaining a state of reasoning for some rule if we know that rule does not pertain to the current subtask the system is trying to solve. Not only is it inefficient to perform such unnecessary calculation, it may be incorrect to do so. If we have no mechanism to disallow a rule from firing, it may activate and fire prematurely making an incorrect assertion based on incomplete data. To prevent both undesirable actions, we have certain implementation dependent rule evaluation procedures and also allow the programmer to use such features to control the flow of the program execution somewhat. We wish the system to do the minimum amount of work necessary to correctly match rules that may fire with facts. It should be obvious that if we do not have at least one fact in each alpha memory for the +CEs in a rule, then we cannot create an instantiation for that rule. If the rule does not have at least one WME in each of its +CEs, then the rule is considered inactive and the only computation done is to place new WMEs - 26 - OPS5c User's Guide being added to working memory into the appropriate CEs for that rule. We do not attempt to calculate variable bindings because we know that at least one CE cannot have anything bound to it, thus such calculations would be useless. This simple implementation rule allows a programmer to greatly increase both program efficiency as well as control. This is accomplished by the programmer's use of context clauses. Normally, a context clause is a class declared by the programmer which describes the current goal or focus of the program. Normally such a class is given a name like 'goal' or 'context'. Each rule in the system (or at least most of them) will contain one CE specifying the context in which that rule is eligible to fire. If the context is currently set to something else, and we have already specified that there is only one context element in the system at a time, then the first CE of that rule will have nothing in it and we will not attempt variable binding computations for the rule when WMEs are added to the other CEs of the rule. The context specifies that the current goal, and nothing but the current goal, will be worked on by the system. Rules not pertaining to solving the current goal are given a minimal amount of attention by the system. Note that when we recognize that it is appropriate to change the system goal, we must modify the existing goal element. Frequently, this will cause a large amount or work to be done, because we will destroy all the instance that the old goal element participated in and there are often many rules that the creation of a new goal element will activate and perform variable binding calculations for. For this reason, we can see that it would be very inefficient to code a rule based program with a large number of rules pertaining to a few goals and switch between the goals frequently. Each goal switch could force a large amount of work to be done. One trick that is common in OPS5 programming pertains to switching contexts or goals. Often we may desire to execute some task (a specific goal or context) until no more rules in that context can fire. Then we wish to change the goal. To do this we rely on the OPS5 conflict resolution strategy. Remember that in resolving which instance is to fire on the current cycle, we use time ordered tag lists. If we exhaust one list while doing the comparison between two lists, then the longer list wins. This is the same as saying that when comparing instances between two rules, given identical time tags participating in both rules, the rule with the most CEs will win the conflict resolution. We can therefore write rules to switch context by relying on the fact that such a rule will never fire as long as a more specific rule (one with more CEs) can fire. As an example, suppose that some part of our algorithm has produced a set of guesses, we select one guess to take action on, then we wish to delete the non-selected guesses, perhaps with the intent of creating a new set later. We might see rules such as: - 27 - OPS5c User's Guide (p remove-guesses (goal ^value delete-guesses) { <THE-GUESS> (guess) } --> (remove <THE-GUESS>) ) (p process-guess-start { <THE-GOAL> (goal ^value delete-guesses) } --> (modify <THE-GOAL> ^value process-guess) ) We know that although the second rule will be active through this subtask, it cannot fire unless the is no possibility of firing the first rule. Hence we will achieve our goal of deleting the old guesses before moving on the the task of processing the one selected. - 28 - OPS5c User's Guide 8. Top Level Commands Top level commands are the interpreted commands that the user types in. The top level commands used in OPS5c are very similar to those used in the lisp-based OPS5. Some commands are unsupported while other commands are added to OPS5c to enhance its usability. When a user executes an OPS5c program, a few lines of system identification are printed followed by the prompt 'Top Level> '. At this point the user can type any of the commands listed in this section. Minimally, the user must do something to initiate processing by the OPS5c system. Initially there is no information in the working memory. In general there are three ways that a user can initiate program execution once the initial top level prompt is given. He can created WMEs using the top level make command. He may use the make command to create a start symbol. Or he may use the include command to add many WMEs to working memory. The first two methods are the same techniques used in OPS5. If the user knows that certain WMEs need to be created at the start of every run, he can create an initialization rule whose RHS makes the necessary elements. The LHS of such a rule usually depends on the existance of a single element of some class such as 'start'. The last method allows the user to specify a path name to a file containing make statements. This file will be interpreted to create the WMEs needed. Note that unlike OPS5, the act of making a WME mearly means that we have created storage representing the WME and correctly initialized the WME's attribute values. The WME will not actually be inserted into working memory until a cycle is run. Thus if you type make commands from the top level prompt, do not expect to see any instances created or insertions into alpha memories until you have issued the run command. The same effect will be noticed when using the remove command. Instances that may be created as a result of removing a WME will not be observed until one or more cycles of the system are executed following the issuing of the remove command. Many RHS actions are also supported as top level commands with nearly identical syntax. The main limitations imposed by using the top level command is that you cannot use variable references, the quote operator ('//'), and cannot use functions as arguments. Commands with these characteristics include make, remove, openfile, closefile, default, and call. For the remove command, instead of supplying a list of element designators to delete, the user supplies a list of time tag elements to be deleted. If called with an asterisk as the only argument, the command will delete all of working memory. The following sub- sections describe the remaining top level commands. Those specific to OPS5c are indicated as such. 8.1 include (OPS5c only) This command is used to read a text file containing make - 29 - OPS5c User's Guide commands. Appropriate WMEs are created but are not added to working memory until at least one cycle has been run after the include command. The include command takes a file name as the single parameter. The file name may be quoted using double quotes, single quotes, or vertical bars. Example: (include "auto-diagnose.input") 8.2 run This command starts executing the match/resolve/act cycles. Optionally an integer parameter may be given to run to indicate the maximum number of cycles to run before returning to the top level. This is particularly useful for debugging where you may want to examine the state of the system after a certain number of cycles and before the program has completed executing. 8.3 watch This command tells the system how much trace information to print out as the system executes. Watch takes a single numeric parameter which must be 0, 1, 2, or 3. A watch level of zero indicates that no trace information is to be output. The only output from the program will be that generated by write statements in the RHS of rules that fire or from externally called procedures. The default watch level of one will print a cycle number, rule number, rule name and time tag list for each instance that is fired. A watch level two provides the watch level one information in addition to indicating when working memory elements are made or removed. The WME itself is printed preceded by '>>' if the element is being added to working memory or '<<' if the element is being removed from working memory. A watch level of three will indicate watch level two information in addition to showing instances being added and removed from the conflict set. 8.4 ppwm The ppwm command is used to print WMEs. The command is followed by a pattern which can only consist of constants and attribute names. The system will scan all of working memory for the specified pattern and will print out each WME (including its assigned time tag) that matches the pattern. When no arguments are given to ppwm, all of working memory is printed. 8.5 wm The wm command is similar to the ppwm command except that instead of specifying a pattern as the arguments of the command, a list of time tags is specified. Similar to ppwm the command prints all of working memory if it is called with no arguments. - 30 - OPS5c User's Guide 8.6 cs This command prints out the contents of the conflict set. The dominant instance (the instance to fire next) is indicated as such. 8.7 strategy This command selects the conflict resolution strategy. It takes either 'lex' or 'mea' as a single argument, or if called with no arguments it prints the current conflict resolution strategy. 8.8 pbreak This command can set break points on specific rules. When called with a rule name it will set a break point on that rule if one is currently unset or remove a break point for that rule if one is currently set. Having a break point set for a particular rule means that the system will return to the top level immediately after firing an instance of that rule. Issuing the command with no arguments prints the names of all rules which currently have break points set. 8.9 dump (OPS5c only) This command prints out the contents (WME time tags) of each of the alpha memories. 8.10 size (OPS5c only) This command prints out the number of WMEs in the working memory. The number of WMEs in each class is listed separately. - 31 - OPS5c User's Guide 9. The OPS5c System 9.1 Installing OPS5c Each C compiler has directories that it searches for standard include files when compiling and standard libraries when linking. By placing the appropriate OPS5c files in these places, you may use the OPS5c compiler just as you would your C compiler. The compiler itself is called 'ops5c'. You may place this somewhere in your execution path if you desire (C: or elsewhere). The header file needed to compile the output from the OPS5c compiler is called 'o5c.h'. Place this file with other standard headers such as 'stdio.h'. The OPS5c run-time library is called 'ops5c.lib'. Place this with your standard libraries such as 'c.lib'. The commands for compiling the sample program 'tourney.ops' using the Aztec C compiler is given below: ops5c test_files/tourney.ops cc tourney1.c ln -o tourney tourney1.o -lops5c -lm -lc Note that to produce the very fastest executables you may turn on optimization when compiling the C output file, but you will probably find that this has little effect other than making your compile take longer. It is far more important to optimize the library code. The code provided in the distribution was created using Aztec C 5.0 with complete optimization turned on (-so). 9.2 Differences from OPS5 OPS5c was designed to be as compatible as possible with existing OPS5 programs. Complete compatibility is virtually unobtainable since OPS5c is compiled and typically OPS5 is interpreted. Other differences may be observed due to implementation details of underspecified features of the language. The following list is a summary of the most important differences that you may encounter when running OPS5c. o The pm, matches, excise, and back top level commands of OPS5 are not implemented in OPS5c. o The top level command dump is used to examine the alpha memory contents. It takes no arguments and prints out the contents of all alpha memories. o The top level command size is used to count the number of elements in each class and add them to determine the entire size of working memory. o The top level include command works similarly to the - 32 - OPS5c User's Guide load command found in some Lisp implementations of OPS5. It accepts a file name as a parameter. The indicated file must contain only make statements which will add new elements to working memory. o Several of the '$' functions used by OPS5 do not make sense in a non-Lisp environment. Additions have also been made to the $functions (see Interfacing With External Code). o It may or may not be possible to provide execution of OPS5c which is identical to that of OPS5 for programs that rely on arbitrary selection of instances. This occurs when two are more instances are considered equivalent by the conflict resolution rules and they exist simultaneously in the conflict set. See Run-Time Switches for a possible fix to this problem. o OPS5c does not normally perform coercion between numeric data types. You can make the system perform coercion by defining the preprocessor symbol COERCE when compiling the <program>1.c file. This will substitute a function call to perform type checking and coercion for the equals and not-equals tests rather than use the macros that are normally used. o OPS5 specifies that all WMEs should contain the same number of attributes, usually 128 in most implementations. In order to conserve space on small systems, OPS5c sizes all WMEs of a particular class according to the maximum offset of the attributes declared in the class. (Note that this is not the same as the number of attributes in the class since one attribute may have a large offset assigned to it.) This may be overriden with the -w switch at either compile- time or run-time. o OPS5c supports modular compilation. A large system may be compiled into multiple C source code files, then linked together to form an executable. This is useful for systems with small amounts of RAM or compilers that cannot handle large symbol tables. It often speeds the compilation process as well. o It should be noted that in this implementation the maximum number of CEs per rule is 64. It is STRONGLY recommended that you limit the number of CEs per rule. (No more that 10 CEs per rule is suggested for performance reasons.) Large rules should be broken up into smaller ones as necessary. - 33 - OPS5c User's Guide 9.3 Compiler Switches There are five switches accepted by the OPS5c compiler. These are -t, -s, -c, -o and -w. The most commonly used switch is -s. This switch controls the ordering method used by the compiler. A separate join code sequence is generated for each CE of each rule. There are two methods used for ordering the scan order of the join. Seed ordering uses the CE being added to as the first CE in the join ordering with the remaining CEs being examined in lexical order. Heuristic ordering is based on seed ordering but tries to examine CE dependencies to produce a more efficient join order of the non-seed CEs (those remaining after the seed CE has been selected). Normally heuristic ordering is used since it normally produces results as good or better than seed ordering. The -s switch indicates that seed ordering should be used. One item of concern is that the heuristic ordering algorithm will perform exhaustive search of all orderings in the worst case. This only happens for rules with highly connected CEs (CEs which have common variable bindings). If worst case behavior is observed, the number of orderings examined is proportional to the square of the number of CEs in a rule, therefore the compilation time for rules with a large number of connected CEs may be very high. The compiler prints an asterisk to the screen upon completion of each rule compiled. If compilation appears to be very slow then the user should consider aborting the compile and recompiling with the -s switch. The -c and -o switches are used in conjunction with creating multiple output modules. Multiple output modules may be necessary on some systems to overcome certain software or hardware limitations. When multiple output modules become necessary, the compiler produces a series of files named '<file>#.c', where the # sign is a numeric sequence number. It also produces a script file called 'comp.sh' which can be used to compile and link the multiple modules. Multiple modules may be advantageous for systems with limited memory which may not be able to compile an entire large rule system. They may also be necessary for linkers which have hard set limits on the number of symbols a single object module may have. Normally the compiler counts functions produced for join operations and functions produced for executing right-hand-side actions. These functions will constitute symbols present in object modules. The compiler has a cut-off setting which indicates when a module has too many symbols. If compiling an addition rule into the current module will exceed the symbol cut-off, then the current module is terminated and a new module is started. The default symbol-cutoff value is 3000. The user may specify a new cut-off value by using the -c switch with a numeric value for the symbol cut-off immediately following. To suppress module creation and produce only one module, use the -o switch. The -t switch is used to specify tracefile execution. Tracefile execution is used to run benchmarks comparing OPS5c with other systems. A tracefile of some expert system run is produced with the other system using a watch level 2 setting - 34 - OPS5c User's Guide (with the watch command). This tracefile can be read by OPS5c to determine appropriate RHS actions after each conflict resolution step. In this way we do not have to worry about recoding external functions used in RHS actions in order to run OPS5c, the tracefile will determine RHS actions. Note that this method may give poor results since the cost of interpreting the tracefile may be considerably more or considerably less than that observed had external function calls to C routines been used. To use tracefile execution, follow the -t switch with the path name of the tracefile. The -w switch allows the user to specify the size of the WMEs created at run-time. It applies to all WMEs rather than to specific classes. Its use is primarily for programs using undeclared classes or attribute specifications in classes that the attributes were not declared in. The switch is followed by an integer number from 1 to 128 which indicates the number of attribute slots the WMEs will contain. This number must be as large as the largest offset of any class or the program will not execute correctly. Programs whose class declarations contain all attributes used by references of that class do not have to worry about setting the -w flag as the system will correctly size each class. 9.4 Run-Time Switches The executable produced by compiling and linking the OPS5c output files will accept three switches at run-time, -r0, -r1 and -w. The -r switches are used to reverse the sense of resolving arbitrary comparisons in the conflict resolution step. When two equivalent instances are compared the system can either retain its current best choice, or swap the best choice with the newly encountered equivalent one. Whether or not the system performs the swap when this equivalence arises is determined by these switches. The -r0 switch controls intra-rule comparisons, i.e. comparisons between instances produced by the same rule. The -r1 switch controls inter-rule comparisons - comparisons between instances produced by different rules. The -w switch works identically to its compile-time counterpart. If used, it overrides variable sizing of classes and will also override any setting created by use of the -w compile-time switch. 9.5 Interfacing With External Code OPS5c has the capability of interfacing with externally written C code. Actually the type of code that is interfaced to is irrelevant. It may be FORTRAN or assembler as long as the translators output a common object format understood by the linker. All external programs must use the interface defined by OPS5c for passing parameters and results between the expert system and the external routines. Additionally, the external routines must have some understanding of the data types and - 35 - OPS5c User's Guide symbol tables used by the OPS5c system in order to correctly interface with the expert system. 9.5.1 Data Types The OPS5c system has the attribute as its simplest unstructured data type. An attribute is actually synonymous with the term atom as used in the Lisp programming language and both terms will be used interchangably. An atom is actually many different data types. An atom is a structure containing two parts, the storage location for a data type, and a flag indicating what type of data is actually being stored there. An atom is similar to the concept of a union in the C language, and in fact is implemented as such: (as defined in o5c.h) typedef union { float fval; int ival; StrID string; } Data; typedef struct attribute { unsigned char type; Data data; } Attr; The types currently defined are: AT_FLOAT : a floating point value AT_INTEGER : an integer value AT_SYMBOL : an interned string (see interning below) If the user wished to create an attribute to represent the constant integer 5, he might code it as: Attr five; five.type = AT_INTEGER; five.data.ival = 5; When coding external functions, the programmer must keep in mind that attributes may be of any type and change their type dynamically as new values are assigned. No assumptions should be made about attribute types. Library functions exist for comparing attribute values. If these are not used, the programmer is responsible for type checking and coercion. Most functions that interface with OPS5c will require pointers to attributes (typedef'd as AttrID) as parameters. It will usually be necessary for external routines to declare attributes, initialize them correctly, and use the appropriate functions for comparison, assignment, and retrieval of attribute values. - 36 - OPS5c User's Guide 9.5.2 The Result Element The result element is a global buffer used to pass parameters between OPS5c and external routines. The result element (RE) is an array of attributes whose size is that of the largest WME supported by the implementation (128 attributes). Values of attributes may be copied to the RE or copied from the RE using a library of functions known as the dollar library. This name arises from the fact that the Lisp implementation names functions which operate on the RE using names that start with a dollar sign ($reset, $value, etc.). Many of these names have been kept, where appropriate, but the functions are named dollar_reset, dollar_value, etc. since C function names cannot contain a dollar sign. Appendix A describes the functions in the dollar library. 9.5.3 Interning and String Spaces The term interning comes from the Lisp environment and is the primary difference between strings and symbols. A string, as defined by the C programming language, is simply a sequence of characters terminated by a null character. We often refer to a string or consider its value to be the pointer to the first character of the string. Such a pointer is used to perform comparisons between two strings on a character by character basis to determine equality, inequality, or other relationships. In OPS5c we are interested in the relationships of equality and inequality between attributes with non-numeric data types. If we were to store non-numeric information as strings and perform character by character matching, the testing process would be very slow. Instead, we form a data structure known as a symbol table. Using such a structure we can store strings (known as symbols) and look up their pointers by a process known as interning. To intern a string, we call the intern function with the string as an argument. The intern function will scan the symbol table looking for an identical string (i.e. symbol). If such a symbol is found then the address of the symbol is returned as the value of the function. If no such symbol is found, the string is copied into the symbol table, thus creating a new symbol, and its address is returned as the value of the interning function. This is done for all non-numeric attribute values in the system. The main reason for this is that once a string has been interned, i.e. entered into the symbol table, we may compare two symbols by comparing their addresses, since if the original strings were identical, then the interning process will return identical addresses. This greatly speeds up comparison of non-numeric data and also results in less storage requirements since we store the address as the value of an interned symbol. As a result, if an external routine wishes to manipulate symbolic data such that the OPS5c inference engine can match properly, attribute values going into and coming out of the RE must be represented as interned symbols. The dollar library - 37 - OPS5c User's Guide contains a very useful function, dollar_intern, for converting a string into an interned symbol. To fully interface with the system, the programmer must be aware of a more generic library known as the strings library. Functions in the strings library aid in manipulating values interned as symbols. The strings library operates on string spaces which are equivalent to symbol tables. The distinction and hence reason for different names, is that the OPS5c system keeps several different string spaces (symbol tables). The most often used string space is that for attribute values. It is called 'attr_values' and should be referenced as such when calling functions in the string library. The functions in the string library are listed in Appendix B. The most commonly used functions are AddString() and StringAvail(). The AddString() function performs interning in a string space. The StringAvail() function simply tests for the existance of a symbol in a string space. Unlike AddString(), StringAvail() will not add a new symbol if it is not already there. Note that the string library functions return pointers of type String Entry. The String Entry data type is defined below. typedef struct { HT_Entry_Header link; StrID str_id; } String_HT_Header; typedef String_HT_Header *StringEntry; Frequently the user will wish to dereference the StringEntry to obtain the StrID of the returned value. The StrID is not a string pointer and should be treated as a handle to the symbol. The GetString() macro can be used to return the string associated with a StrID. Another string space of interest is 'attr_names'. It contains symbols for all the declared attribute names known by the system. The programmer should not attempt to add to the attr_names string space, only to read from it. StringEntry's retrieved from this name space should be cast to (struct attr_off *) where: struct attr_off { String_HT_Header link; int offset; }; Such a pointer can be dereferenced by ptr->offset to retrieve the offset (slot position in the RE) associated with a particular attribute name. This is a common occurrance when we wish to access values by attribute name which is the most common method. - 38 - OPS5c User's Guide 9.5.4 Functions Versus Actions There are two types of external code that the user can interface to an OPS5c program, functions and actions. The distinction between these two forms of interface in the OPS5 language is vague. Functions must return a single value in the result element. This is accomplished with the dollar_value() call. One and only one call to dollar_value() must be made by a function. The OPS5 language does not specify how argument passing to functions are implemented. For predefined functions in-line code or calls to routines in the run-time library are produced. For user-defined functions, arguments are passed via a global array args[] which is an array of attribute values (atoms). A global counter, fargc, is used to indicate the number of valid arguments in args[]. This is similar to the argc and argv[] method of passing command line arguments to C programs. Once parameters are loaded into the args[] array, the external function is called. A C function integer argument, which is the number of actual parameters, is passed to the function. Unlike functions, actions may utilize all the functions defined in the dollar library. An action may return zero or more values. These values are usually returned in the result element using the dollar_value() function. Actions are passed arguments through the result element. Before calling the action the result element is cleared (filled with nil atoms) and then it is filled with the evaluated pattern that comprises the parameters. No modification of the result element is performed after the action has finished with it. Note that the OPS5 language specifies that all external functions and actions must be declared with the external statement. In OPS5c this is not a requirement, however, actions which are not declared using external cannot be called from the top-level environment. - 39 - OPS5c User's Guide 10. References Forgy81 Forgy, Charles L. , OPS5 User's Manual, Department of Computer Science, Carnegie-Mellon University. CMU-CS-81-135 Miranker87 Miranker, D.P., TREAT: A Better Match Algorithm for AI Production Systems: Long Version, Artificial Intelligence Laboratory, University of Texas at Austin, Technical Report AI TR-87-58 - 40 - OPS5c User's Guide Appendix A - Dollar Library Description /* Possible return value of dollar_litbind() */ #define NOT_DEFINED -1 /************************************************************ This routine creates a new wme from the result element. ************************************************************/ void dollar_assert(class_index) int class_index; /************************************************************ This function takes an atom (attribute id) and returns its integer value. ************************************************************/ int dollar_cvan(atom) AttrID atom; /************************************************************ This function takes an integer and returns a numeric atom (interned). ************************************************************/ Attr dollar_cvna(num) int num; /************************************************************ This function is used to test the equality of two atoms (attributes). ************************************************************/ BOOLEAN dollar_eql(atom1,atom2) AttrID atom1; AttrID atom2; /************************************************************ This function places a float value in the result element. The value of the float is returned. ************************************************************/ float dollar_flt_val(value) float value; - 41 - OPS5c User's Guide /************************************************************ This function determines if a file of specified name has been opened for input. ************************************************************/ FILE * dollar_ifile(attr) AttrID attr; /************************************************************ This function places an integer value in the result element. The value of the integer is returned. ************************************************************/ int dollar_int_val(value) int value; /************************************************************ This function is used to intern a string into the attr_values string table. ************************************************************/ StrID dollar_intern(str) char *str; /************************************************************ This function determines if its string argument is an attribute name that has been assigned an offset at compile time. If so, the offset is returned. If not NOT_DEFINED is returned. ************************************************************/ int dollar_litbind(str) char *str; /************************************************************ This function returns an attibute (Attr, not an AttrID) when given an index into the result element ************************************************************/ Attr dollar_parameter(index) int index; /************************************************************ This function returns a count of the number of elements in the result element. ************************************************************/ int dollar_parametercount() - 42 - OPS5c User's Guide /************************************************************ This routine clears the result element (fills it with nil) and resets the result element index. ************************************************************/ void dollar_reset() /************************************************************ This function places a string value in the result element. A StrID must be passed. (See strings module.) The StrID is returned as the result. ************************************************************/ StrID dollar_str_val(value) StrID value; /************************************************************ This predicate determines if an atom (attribute) is symbolic. ************************************************************/ BOOLEAN dollar_symbol(x) Attr x; /************************************************************ This routine sets the index of the result element. Indicies are numbered starting at 1. The argument passed must be integer. ************************************************************/ dollar_tab1(num) int num; /************************************************************ Similar to dollar_tab1() above except that the argument passed is an AttrID. The attribute may be numeric (integer or float) or a numeric string. ************************************************************/ void dollar_tab2(attr_id) AttrID attr_id; /************************************************************ This routine takes an attribute structure (not an AttrID) and places it in the result element at the current position. The pointer is then incremented to the next position. ************************************************************/ dollar_value(attr) Attr attr; - 43 - OPS5c User's Guide Appendix B - String Library Description /************************************************************ * This function "interns" a string and returns its ID code. In * this implementation version, the ID is a memory address. ************************************************************/ StringEntry AddString(str,string_table,struct_size) char *str; HashTable string_table; int struct_size; /************************************************************ * This function checks to see if the indicated string already * exists in the specified string table. If so, the StringEntry * of the string is returned, else NULL is returned. ************************************************************/ StringEntry StringAvail(str,string_table) char *str; HashTable string_table; /************************************************************ * This is a macro which can be used with a StrID to return the * pointer to the C string representation of the interned * string. ************************************************************/ char * GetString(str_id) StrID str_id; - 44 - OPS5c User's Guide Index // 14, 17-18, 29 accept stream 22 ACT 2-3, 5-13, 16-24, 26-30, 34-37, 39 actions 2-3, 5-7, 12-13, 16-17, 19-20, 24, 26, 29, 34-35, 39 AddString() 38 alpha memory 7-8, 26, 32 amem 7-8 atom 2, 10, 17-19, 36, 39, 41, 43 Attr 2, 5, 9-18, 20, 29-30, 33, 35-39, 41-43 attribute 2, 5, 9-11, 13-18, 20, 29-30, 33, 35-39, 41-43 attribute variable 13 attribute-value pair 13-14, 20 AttrID 36, 41-43 attr_names 38 attr_values 38, 42 call 2, 4-7, 16-19, 21-22, 26, 29-39 case 5-7, 10, 16, 24, 34 CE 2-43 class 2, 4-5, 8-15, 20, 22, 27, 29, 31-33, 35, 41 closefile 2, 19, 22-23, 29 coercion 10, 33, 36 comp.sh 34 condition element 2, 5, 13, 15-16, 24 conflict resolution 2, 6-7, 9, 24, 27, 31, 33, 35 conflict set 6-7, 15-16, 22, 24, 30-31, 33 conjuctions 2, 14 constant tests 7-8, 25 context clause 25, 27 contexts 27 crlf 21 default 2, 19, 22-23, 29-30, 34 disjunctions 2, 15 dominant instance 31 element designator 2, 16-18, 20, 29 element variable 2, 16-17, 20-21 functions 2-4, 10, 17-18, 21, 23, 29, 33-39 heuristic ordering 34 incremental matching 6, 8 instance 5-8, 15-17, 24-25, 27, 29-31, 33, 35 interning 3, 36-38 JOIN 7, 34 join code 34 LEX 24-25, 31, 34 LHS 2, 5-7, 10, 12-13, 16-18, 20, 24, 29 literal 5, 9-10, 13-15 literalize 5, 9-10, 14 make 2, 5, 7, 16, 19-21, 24, 26, 29, 33 MATCH 4, 6-8, 13-16, 20, 26, 30, 32, 37, 40 MEA 4, 6, 8, 13, 15, 17, 21, 24-25, 29, 31 modular compilation 33 modules 34 nil 10, 17, 39, 43 - 45 - OPS5c User's Guide o5c.h 32, 36 offsets 9-10, 17-18, 20 openfile 2, 19, 22-23, 29 ops5c.lib 32 predicates 2, 10, 14, 43 quoting 2, 13, 15 RE 2-44 recency 24 remove 2, 6, 15-16, 19-21, 24, 28-31 result element 3, 17-18, 37, 39, 41-43 RETE 4-5, 7, 9, 13, 18, 20, 29, 32 RHS 2, 5-7, 12, 16-20, 22, 24, 29-30, 35 rjust 21 rule 2, 4-10, 12-16, 19-20, 22, 24-31, 33-35 seed ordering 34 SELECT 5-7, 15, 24-28, 31, 33-34 specificity 24 StrID 36, 38, 42-44 string space 3, 37-38 StringAvail() 38 StringEntry 38, 44 strings 10, 37-38, 43 subtasks 26 symbol table 9, 33, 36-38 symbols 10, 13-14, 34, 37-38 tabto 21 time tags 9, 24-25, 27, 29-31 top level 2, 13, 19, 22, 29-32 trace stream 22 TREAT 4-5, 7, 11, 13, 19, 21, 38, 40 variable binding tests 7, 25 vector attributes 2, 10-11 WM 2, 5-11, 13-18, 20-21, 24, 26-27, 29-31, 33, 35, 37, 41 WME 6-11, 13-18, 20-21, 24, 26-27, 29-31, 33, 35, 37, 41 Working memory 2, 5-7, 9, 15, 17, 21-22, 24, 27, 29-33 working memory element 2, 6, 9, 30 write stream 22 \\ 18 - 46 -