home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!paladin.american.edu!auvm!CCB.BBN.COM!BNEVIN
- Return-Path: <@VMD.CSO.UIUC.EDU:bnevin@ccb.bbn.com>
- Message-ID: <CSG-L%93012107351583@VMD.CSO.UIUC.EDU>
- Newsgroups: bit.listserv.csg-l
- Date: Thu, 21 Jan 1993 08:26:22 EST
- Sender: "Control Systems Group Network (CSGnet)" <CSG-L@UIUCVMD.BITNET>
- From: "Bruce E. Nevin" <bnevin@CCB.BBN.COM>
- Subject: MDL refs from Boulanger
- Lines: 139
-
- [From: Bruce Nevin (Thu 930121 08:24:53)]
-
- Here is a bibliographical followup by Albert Boulanger, FYI.
-
- Bruce
- bn@bbn.com
-
- -=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-
- Date: Wed, 20 Jan 93 16:54:02 -0800
- From: Michael Pazzani <pazzani@ics.uci.edu>
-
- Machine Learning List: Vol. 5 No. 1
- [ . . . ]
- ----------------------------------------------------------------------
-
- To: ml@ics.uci.edu
- Subject: MDL references
- From: aboulang@bbn.COM
- Date: Sat, 9 Jan 93 19:49:29 EST
-
-
- Several folk, in response to my posting, asked for some references on
- MDL-like methods. Some references are incomplete so perhaps you can
- help *me* complete them,
-
-
- First off, probably one can say the spirit of MDL (stochastic
- complexity) methods was in the early work of algorithmic complexity
- theory -- especially the work on a general methodology for inductive
- inference by Solomonoff. The big problem with algorithmic complexity
- is that it is not computable.
-
- ****************
-
- Akiake's early work on an MDL-like metric for model selection is
- notable in that work has been done with it to formally relate it to
- cross-validation, asymptotically:
-
- "An Asymptotic Equivalence of Choice of Model by Cross-Validation and
- Akaike's Criterion". M, Stone, ????, 44-47
-
-
- Akiake did not take into account sample size into his metric.
-
- ****************
- There is a pair of back-to-back papers in the Journal of the Royal
- Statistics Society B(1987), by Jorma Rissanen and then C.S. Wallace &
- P.R. Freeman on their contributions to MDL-like metrics:
-
-
- "Stochastic Complexity", Jorma Rissanen, J. R. Statist. Soc B(1987)
- 49, No 3, pp 223-239, and 252-265.
-
- "Estimation and Inference by Compact Coding", J. R. Statist. Soc B(1987)
- 49, No 3, pp 240-265.
-
- These two papers are of interest along with the discussion section on
- pages 252-265. The crux of the issue (besides the silly "who published
- first") is Rissanen use of a "universal" prior. Wallace does not,
- being a die-hard prior believer. (Personally I feel that often the
- choice of prior can lead to radically different answers, and any
- attempt at making a more "robust" method should be welcomed. This
- observation comes from some work I did on a health-risk assessment
- program. I realized that misinformation in the system can really screw
- things up in using a normal Bayesian framework -- we should seek a
- more robust approach.)
-
- Two other references to Rissanen are:
-
- J. Rissanen, "Universal Coding, Information, Prediction &
- Estimation,", IEEE Trans. Inform. Theory, vol 30, pp 629-636, 1984
-
- & his book:
-
- J. Rissanen, "Stochastic Complexity in Statistical Inquiry", World
- Scientific, N.J., 1989
-
-
- ****************
-
- An application of MDL to *unsupervised* clustering is:
-
- "Finding Natural Clusters Having Minimum Description Length"
- Richard S. Wallace & Takeo Kanade, IEEE ???, 438-442, 1990
-
- ****************
- MDL is the basis for pruning methods for tree classifiers (*supervised*):
-
- J. R. Quinlan and R. Riverst, "Inferring Decision Trees Using the
- Minimum Description Length Principle," Information and Computation,
- 80, 227-248.
-
- ****************
-
- As I mentioned in the short note, one can push the use of MDL earlier
- into the generation phase of machine learning programs. In this paper,
- it used for both growing and pruning the decision tree:
-
- "Construction of Tree Structured Classifiers by the MDL Principle",
- Mati Wax, ICASSP (??) Proceedings, M7.10, 2157-2160, 1990.
-
- ****************
-
- Padhraic Smyth has applied it for model selection of Markov random
- fields, decision tree classifiers, and multilevel feedforward NNets,
-
- "Admissible Stochastic Complexity Models for Classification
- Problems", ??? 26.1-26.8
-
- See also:
-
- "A General Selection Criterion for Inductive Inference"
- M.P. Geirgeff & C.S. Wallace, Advances in Artificial Intelligence,
- T. O'Shea (ed.), Elsevier. ECCAI, 1985.
-
- ****************
-
- Finally, a really neat thesis of George Hart:
-
- "Minimum Information Estimation of Structure"
- MIT Lab. of Information and Decision Science, LIDS-TH-1664, April 1987.
-
- His main development is the inference of FSMs from strings. The main
- application, is a really neat inverse problem -- infer the different
- electrical loads of a house only from a recording off a load meter
- external to the house.
-
- Wallace also applied it to figuring out patterns in Stone Circles.
-
-
- ***************
-
- Again, I am sorry for the incomplete references.
-
- Regards,
- Albert Boulanger
- aboulanger@bbn.com
-
- [ . . . ]
-