home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.ai:3311 comp.ai.neural-nets:3375 sci.math.num-analysis:2621 sci.math.stat:1782
- Path: sparky!uunet!decwrl!access.usask.ca!ccu.umanitoba.ca!ciit85.ciit.nrc.ca!ciit85.ciit.nrc.ca!news
- Newsgroups: comp.ai,comp.ai.neural-nets,sci.math.num-analysis,sci.math.stat
- Subject: REQUEST: Multi-algorithm Machine Learning system?
- Message-ID: <1992Sep2.104535.1196@ciit85.ciit.nrc.ca>
- From: Dick Jackson <Dick_Jackson@ibd.nrc.ca>
- Date: 2 Sep 92 10:45:29 +0600
- Distribution: world
- Organization: National Research Council Canada
- Nntp-Posting-Host: jackson.ibd.nrc.ca
- X-UserAgent: Nuntius v1.1.1d9X-XXDate: Wed, 2 Sep 92 16:43:26 GMTLines: 66
- Lines: 66
-
- Gentle readers,
-
- Our Informatics group has been discussing the need for a software
- system for doing Multivariate Analysis, primarily for classification
- and clustering tasks, making techniques from different areas of Machine
- Learning available to the user.
-
- What I would like to know is: has something of this kind been developed
- already? Many excellent individual machine-learning programs are
- available from different sources, but has anyone made a system which
- allows combination and comparison of different algorithms?
-
- In more detail, we would want the system to include:
-
- A. Pre-processing of raw data file:
- - provide a means for choosing a sequence of options such as:
- - normalizing selected variables
- - filling in missing data
- - creating new variables from existing ones
- - performing transforms via Principal Component Analysis, etc.
- - splitting data into 'training set' and 'test set'
- - saving resulting dataset with pre-processing details
-
- B. Dataset Analysis
- - following with the dataset above, allow any of a number of types of
- analysis, each of which result in a clustering or classification
- 'system', such as:
- - inductive learning, giving a decision tree or other representation
- - connectionist, giving a trained neural net
- - LDA, genetic algorithms, fuzzy clustering...
- (incorporating software from willing sources)
- - parameters/options for these analyses can be numerous, but
- heuristics can give a good first-attempt at parameter choice
- - interactive displays may be needed for some
- - end results to be saved for future use
-
- C. Test/Use of classifier systems:
- - pass test data through resulting classifier systems, giving reports
- of accuracy, sensitivity, specificity, etc.
- - pass new unclassified data through classifiers, giving predicted
- classes (with confidence estimates?)
-
- D. Meta-analysis:
- - based on reports from the previous stage, devise more robust
- classification systems incorporating multiple techniques
-
- The target user could be a medical researcher, not a programmer, so a
- clear graphical user interface is of high importance.
-
- It is clear that much of part A is seen in some of the better
- statistical packages, but what about the machine learning techniques?
- Is anyone developing a multiple-technique 'workbench' like this? If
- not, we might be interested in starting up such a project.
-
- I welcome any comments on this topic, please reply to me directly. If
- there is enough interest in a summary, I will provide it, so tell me
- which newsgroup you read this in.
-
- Thanks for your time,
-
- -Dick
-
- Dick Jackson
- Institute for Biodiagnostics National Research Council Canada
- Winnipeg, Manitoba Dick_Jackson@ibd.nrc.ca
- Any opinions: Mine alone!
-