home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.ai:2919 comp.ai.shells:373 comp.lang.prolog:1471
- Path: sparky!uunet!mcsun!uknet!edcastle!aisb!aifh!xindongw
- From: xindongw@aisb.ed.ac.uk (Xindong Wu)
- Newsgroups: comp.ai,comp.ai.shells,comp.lang.prolog
- Subject: HCV: A New Efficient Attribute-Based Induction Algorithm
- Keywords: rule induction, extension matrix approach
- Message-ID: <1992Jul29.160052@aifh.ed.ac.uk>
- Date: 29 Jul 92 15:00:52 GMT
- Sender: news@aisb.ed.ac.uk (Network News Administrator)
- Reply-To: xindongw@aisb.ed.ac.uk (Xindong Wu)
- Followup-To: ai.msgs,comp.sources.misc
- Organization: Dept of AI, University of Edinburgh, Scotland
- Lines: 142
-
-
- HCV: A New Efficient Attribute-Based Induction Algorithm
- --------------------------------------------------------
-
- 1. Introduction
-
- The HCV induction algorithm is invented by Xindong Wu in the Department of
- Artificial Intelligence at the University of Edinburgh (UK) and the Department
- of Computer and Information Engineering at Hefei University of Technology
- (China). It is a representative of the extension matrix approach based family
- of attribute-based induction algorithms, originating with J.R. Hong's AE1.
-
- By dividing the positive examples (PE) of a specific concept in a given example
- set into intersecting groups and adopting a set of strategies to find a
- heuristic conjunctive rule in each group which covers all the group's positive
- examples and none of the negative examples (NE), the HCV algorithm can find a
- rule in the form of variable-valued logic for the concept based on PE against
- NE in low-order polynomial time. If there exists at least one conjunctive rule
- in a given training example set for PE against NE, the rule produced by the HCV
- algorithm must be a conjunctive one. The rules in variable-valued logic
- generated by the HCV algorithm have been shown empirically to be more compact
- than the decision trees or their equivalent decision rules produced by the ID3
- algorithm (the best-known induction algorithm to date) and its successors in
- terms of the numbers of conjunctive rules and conjunctions.
-
- The term ``HCV'' in this prospectus indicates the current implementation
- (Version 1.0) of the HCV algorithm in SICStus Prolog which runs on SUN3 and
- SPARC workstations. In this implementation, HCV can classify more than 2
- classes of examples by incorporating the AQ technique developed in the
- generalization-specialization strategy based family of induction algorithms.
- It takes a set of pre-classified training examples (vectors of
- attribute-values)
- as its input and produces a set of rules as its output classifying the
- training
- examples. It also allows you to evaluate the rules' accuracy in terms of a
- set
- of pre-classified testing examples.
-
- 2. Inputs
-
- HCV's examples are vectors of attribute-values. Values can be either symbolic
- or numeric. Don't Cares (denoted by `*') are permitted.
-
- 3. Outputs
-
- The output is an unordered set of rules in the form of the variable-valued
- logic
- adopted in the generalisation specialisation strategy based AQ family of
- induction algorithms.
-
- 4. User interaction
-
- To use the program, you must prepare your training and testing examples in
- the
- form of ASCII files in a fixed format. During the program execution, all you
- need to do is provide your file names. All outputs and some intermediate
- results
- are given on the screen.
-
- 5. Application domains
-
- HCV works with any classification problems (such as diagnosis, prediction,
- and
- decision-making) that fit its input grammar.
-
-
- 6. Example
-
- Input: Cases of T and F
-
- attrilist(['X1','X2','X3','X4']).
- eg([1,a,*,1],'F').
- eg([1,a,a,0],'F').
- eg([1,b,c,1],'T').
- eg([0,b,b,0],'T').
- eg([0,a,c,1],'T').
- eg([1,b,a,*],'T').
- eg([1,c,c,0],'F').
- eg([1,c,b,1],'F').
- eg([0,c,b,0],'T').
- eg([0,a,a,0],'T').
- eg([0,c,c,1],'F').
- eg([0,c,a,0],'T').
- eg([1,a,b,0],'F').
- eg([0,a,a,1],'T').
- eg([0,b,a,1],'T').
-
- Output: Rules in variable-valued logic
-
- [ X2=[b] ]
- v [ X2=[c,a] ]
- [ X1=[0] ] [ X1=[1] ]
- [ X2=[a] ] v
- v [ X2=[c] ]
- [ X1=[0] ] [ X4=[1] ]
- [ X4=[0] ] -->
- --> The F class.
- The T class.
-
- 7. Main references
-
- [1] Xindong Wu, Optimization Problems in Extension Matrixes, Science in China
- (Series A), English edition: 35}(1992), 3: 363--373; Chinese edition:
- 35(1992), 2: 200-207.
- [2] Xindong Wu, HCV: A Heuristic Covering Algorithm for Extension Matrix
- Approach, DAI Research Paper No. 578, Department of Artificial
- Intelligence,
- University of Edinburgh, 1992.
- [3] Xindong Wu, Inductive Learning: Algorithms and Frontiers, Artificial
- Intelligence Review}, 6(1992).
-
- 8. Availability
-
- HCV (Version 1.0) is available electronically for academic use at no cost,
- and
- for commercial use by arrangement with the author. It may not be
- redistributed.
- To obtain a copy, send e-mail to 'xindongw@uk.ac.ed.castle' (the recipient
- must
- sign a condition-of-use agreement).
-
- For people in the Artificial Intelligence Department at Edinburgh, there are
- 2
- programs, 'HCV_Sun3' and 'HCV_Sparc', in the ~xindongw/HCV directory for SUN
- 3
- and SUN 4 (SPARC) workstations respectively. You can run HCV by simply
- entering
- ~xindongw/HCV/HCV_Sun3 or ~xindongw/HCV/HCV_Sparc according to the machine
- you
- are using.
-
- All the files with the suffix _data in the same directory with the HCV
- program(s) are example sets which can be used as training and/or testing
- examples to run HCV.
-
- 9. For further information, please contact
-
- Xindong Wu
- Department of Artificial Intelligence Telephone: +44 (0)31 650 2727
- University of Edinburgh Fax: +44 (0)31 650 6516
- 80 South Bridge Email: xindongw@uk.ac.ed.castle
- Edinburgh EH1 1HN, U.K.
-