home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!elroy.jpl.nasa.gov!usc!snorkelwacker.mit.edu!ai-lab!sun-of-smokey!marcus
- From: marcus@sun-of-smokey.NoSubdomain.NoDomain (Jeff Marcus)
- Newsgroups: comp.ai.neural-nets
- Subject: Re: need for unique test sets
- Message-ID: <25663@life.ai.mit.edu>
- Date: 21 Jul 92 14:10:26 GMT
- References: <1992Jul19.070433.5896@afterlife.ncsc.mil> <25633@life.ai.mit.edu> <arms.711688181@spedden>
- Sender: news@ai.mit.edu
- Organization: MIT/LCS Spoken Language Systems
- Lines: 63
-
- In article <arms.711688181@spedden>, arms@cs.UAlberta.CA (Bill
- Armstrong) writes:
- <Deleted lots of stuff with which I agree>
-
- |> Anyway, maybe everyone will disagree with me that either brand of
- |> testing, with or without overlap, is still inadequate because:
- |>
- |> a. without a priori knowledge, no one can know what the "correct"
- |> function is, based on a fixed finite sample.
- |>
- |>
- |> b. in general, no set of tests unless they cover the whole space
- |> can assure that the neural net output will do what is correct
- |> even if you know what "correct" means. There has to be a proof technique
- |> somehow.
- |>
- |> Bill
- |>
- |>
- |> --
- |> ***************************************************
- |> Prof. William W. Armstrong, Computing Science Dept.
- |> University of Alberta; Edmonton, Alberta, Canada T6G 2H1
- |> arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071
-
- I guess I disagree with your use of the word "inadequate." But maybe we
- are coming
- from two different cultures. Because I am working in speech recognition,
- where nobody
- expects zero error rate or perfectly separable classes (and I assume the
- same should
- apply to OCR), I don't expect to attain zero error rate and show that my
- classifier
- has learned some generating function perfectly. Nor is it reasonable to
- think of
- all possible inputs since the pool of possible speech is infinite.
-
- However, the more test data I have, the more sure I am that the
- result on the test data is a good predictor of the classifier's performance on
- new data. This can be quantified with a confidence interval on my
- classification
- error. This scheme is only inadequate in the sense that I can never
- achieve
- a zero confidence bound. But I don't see why that is important.
-
- Of course, all this assumes that I have a good way of
- selecting a representative sample in my test set, a non-trivial problem
- in its own right.
-
- The reason I say two cultures is that I am under the impression that neural net
- researchers are also interested in learning whether their networks can discover
- functions like XOR and maybe a performance metric like classification error is
- inappropriate. However, my guess would be that as long as there is some
- random process
- behind generating the training and test sets, you can frame performance
- estimation as
- a statistical estimation problem and the same arguments I just made
- would apply: namely,
- that as you get more data, you get more confident of your result, but
- you can never
- be sure that your result is exactly right.
-
- Jeff
-