NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / bit / listserv / csgl / 2237 < prev next >

Wrap

Text File | 1993-01-21 | 4.8 KB | 151 lines

Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU Path: sparky!uunet!paladin.american.edu!auvm!CCB.BBN.COM!BNEVIN Return-Path: <@VMD.CSO.UIUC.EDU:bnevin@ccb.bbn.com> Message-ID: <CSG-L%93012107351583@VMD.CSO.UIUC.EDU> Newsgroups: bit.listserv.csg-l Date: Thu, 21 Jan 1993 08:26:22 EST Sender: "Control Systems Group Network (CSGnet)" <CSG-L@UIUCVMD.BITNET> From: "Bruce E. Nevin" <bnevin@CCB.BBN.COM> Subject: MDL refs from Boulanger Lines: 139 [From: Bruce Nevin (Thu 930121 08:24:53)] Here is a bibliographical followup by Albert Boulanger, FYI. Bruce bn@bbn.com -=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=- Date: Wed, 20 Jan 93 16:54:02 -0800 From: Michael Pazzani <pazzani@ics.uci.edu> Machine Learning List: Vol. 5 No. 1 [ . . . ] ---------------------------------------------------------------------- To: ml@ics.uci.edu Subject: MDL references From: aboulang@bbn.COM Date: Sat, 9 Jan 93 19:49:29 EST Several folk, in response to my posting, asked for some references on MDL-like methods. Some references are incomplete so perhaps you can help *me* complete them, First off, probably one can say the spirit of MDL (stochastic complexity) methods was in the early work of algorithmic complexity theory -- especially the work on a general methodology for inductive inference by Solomonoff. The big problem with algorithmic complexity is that it is not computable. **************** Akiake's early work on an MDL-like metric for model selection is notable in that work has been done with it to formally relate it to cross-validation, asymptotically: "An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike's Criterion". M, Stone, ????, 44-47 Akiake did not take into account sample size into his metric. **************** There is a pair of back-to-back papers in the Journal of the Royal Statistics Society B(1987), by Jorma Rissanen and then C.S. Wallace & P.R. Freeman on their contributions to MDL-like metrics: "Stochastic Complexity", Jorma Rissanen, J. R. Statist. Soc B(1987) 49, No 3, pp 223-239, and 252-265. "Estimation and Inference by Compact Coding", J. R. Statist. Soc B(1987) 49, No 3, pp 240-265. These two papers are of interest along with the discussion section on pages 252-265. The crux of the issue (besides the silly "who published first") is Rissanen use of a "universal" prior. Wallace does not, being a die-hard prior believer. (Personally I feel that often the choice of prior can lead to radically different answers, and any attempt at making a more "robust" method should be welcomed. This observation comes from some work I did on a health-risk assessment program. I realized that misinformation in the system can really screw things up in using a normal Bayesian framework -- we should seek a more robust approach.) Two other references to Rissanen are: J. Rissanen, "Universal Coding, Information, Prediction & Estimation,", IEEE Trans. Inform. Theory, vol 30, pp 629-636, 1984 & his book: J. Rissanen, "Stochastic Complexity in Statistical Inquiry", World Scientific, N.J., 1989 **************** An application of MDL to *unsupervised* clustering is: "Finding Natural Clusters Having Minimum Description Length" Richard S. Wallace & Takeo Kanade, IEEE ???, 438-442, 1990 **************** MDL is the basis for pruning methods for tree classifiers (*supervised*): J. R. Quinlan and R. Riverst, "Inferring Decision Trees Using the Minimum Description Length Principle," Information and Computation, 80, 227-248. **************** As I mentioned in the short note, one can push the use of MDL earlier into the generation phase of machine learning programs. In this paper, it used for both growing and pruning the decision tree: "Construction of Tree Structured Classifiers by the MDL Principle", Mati Wax, ICASSP (??) Proceedings, M7.10, 2157-2160, 1990. **************** Padhraic Smyth has applied it for model selection of Markov random fields, decision tree classifiers, and multilevel feedforward NNets, "Admissible Stochastic Complexity Models for Classification Problems", ??? 26.1-26.8 See also: "A General Selection Criterion for Inductive Inference" M.P. Geirgeff & C.S. Wallace, Advances in Artificial Intelligence, T. O'Shea (ed.), Elsevier. ECCAI, 1985. **************** Finally, a really neat thesis of George Hart: "Minimum Information Estimation of Structure" MIT Lab. of Information and Decision Science, LIDS-TH-1664, April 1987. His main development is the inference of FSMs from strings. The main application, is a really neat inverse problem -- infer the different electrical loads of a house only from a recording off a load meter external to the house. Wallace also applied it to figuring out patterns in Stone Circles. *************** Again, I am sorry for the incomplete references. Regards, Albert Boulanger aboulanger@bbn.com [ . . . ]