NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / ai / neuraln / 2850 < prev next >

Wrap

Internet Message Format | 1992-07-21 | 2.9 KB

Path: sparky!uunet!elroy.jpl.nasa.gov!usc!snorkelwacker.mit.edu!ai-lab!sun-of-smokey!marcus From: marcus@sun-of-smokey.NoSubdomain.NoDomain (Jeff Marcus) Newsgroups: comp.ai.neural-nets Subject: Re: need for unique test sets Message-ID: <25663@life.ai.mit.edu> Date: 21 Jul 92 14:10:26 GMT References: <1992Jul19.070433.5896@afterlife.ncsc.mil> <25633@life.ai.mit.edu> <arms.711688181@spedden> Sender: news@ai.mit.edu Organization: MIT/LCS Spoken Language Systems Lines: 63 In article <arms.711688181@spedden>, arms@cs.UAlberta.CA (Bill Armstrong) writes: <Deleted lots of stuff with which I agree> |> Anyway, maybe everyone will disagree with me that either brand of |> testing, with or without overlap, is still inadequate because: |> |> a. without a priori knowledge, no one can know what the "correct" |> function is, based on a fixed finite sample. |> |> |> b. in general, no set of tests unless they cover the whole space |> can assure that the neural net output will do what is correct |> even if you know what "correct" means. There has to be a proof technique |> somehow. |> |> Bill |> |> |> -- |> *************************************************** |> Prof. William W. Armstrong, Computing Science Dept. |> University of Alberta; Edmonton, Alberta, Canada T6G 2H1 |> arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071 I guess I disagree with your use of the word "inadequate." But maybe we are coming from two different cultures. Because I am working in speech recognition, where nobody expects zero error rate or perfectly separable classes (and I assume the same should apply to OCR), I don't expect to attain zero error rate and show that my classifier has learned some generating function perfectly. Nor is it reasonable to think of all possible inputs since the pool of possible speech is infinite. However, the more test data I have, the more sure I am that the result on the test data is a good predictor of the classifier's performance on new data. This can be quantified with a confidence interval on my classification error. This scheme is only inadequate in the sense that I can never achieve a zero confidence bound. But I don't see why that is important. Of course, all this assumes that I have a good way of selecting a representative sample in my test set, a non-trivial problem in its own right. The reason I say two cultures is that I am under the impression that neural net researchers are also interested in learning whether their networks can discover functions like XOR and maybe a performance metric like classification error is inappropriate. However, my guess would be that as long as there is some random process behind generating the training and test sets, you can frame performance estimation as a statistical estimation problem and the same arguments I just made would apply: namely, that as you get more data, you get more confident of your result, but you can never be sure that your result is exactly right. Jeff