NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / ai / neuraln / 2872 < prev next >

Wrap

Internet Message Format | 1992-07-22 | 4.9 KB

Path: sparky!uunet!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!sdd.hp.com!mips!darwin.sura.net!haven.umd.edu!mimsy!afterlife!hcbarth From: hcbarth@afterlife.ncsc.mil (Bart Bartholomew) Newsgroups: comp.ai.neural-nets Subject: Re: re:need for unique test sets Message-ID: <1992Jul22.031319.15531@afterlife.ncsc.mil> Date: 22 Jul 92 03:13:19 GMT References: <1992Jul19.070433.5896@afterlife.ncsc.mil> <25633@life.ai.mit.edu> Organization: The Great Beyond Lines: 93 In article <25633@life.ai.mit.edu> marcus@goldilocks.lcs.mit.edu (Jeff Marcus) writes: >In article <1992Jul19.070433.5896@afterlife.ncsc.mil>, >hcbarth@afterlife.ncsc.mil (Bart Bartholomew) writes: >|> >|> If you want to have any confidence that your network has >|> deduced the correct generating function from the training set, >|> you must be sure that the test set has no members from the >|> training set. Ponder: If the net gets some number correct on >|> the training set, and if the test set has some members from the >|> training set, then the apparent performance metric on the test >|> set will be skewed by the contribution of the members of the >|> training set. >|> Since there is a large, possibly infinite set of >|> functions that can generate any data set, the question we pose >|> to the test set is whether we have found the correct function. >|> The answer to that question is measured by how well the net can >|> 'generalize' - can the function that the net found to explain the >|> training set also explain the test set? >|> Clearly, the larger both sets are (up to the point where >|> all possible members of the function are included), the more >|> confidence we have in the answer. >|> So, I disagree with Prof Armstrong and you about the need >|> for having no overlap between the training and test sets. >|> Sincerely, >|> Bart >|> >|> -- > >I stand by my original comments. I don't know what you mean by >"a contribution by some members of the training set." If the two >sets are chosen independently, there is no contribution; the test set is just >reflecting the fact that it is possible to have data in the popluation that has >been captured in the training set. If the two sets are not chosen >independently, you >are doing something wrong. > >One thing that I am unclear about is: > >Are you drawing samples from a finite, discrete-valued population or a >continuous-valued >one. If the latter, you don't have to worry about having the exact same >sample in the >test set. If the former, then your argument would imply that the more >training data you >have, the smaller allowable test set for testing your network. So let's >say you include >all possible test patterns but one in your training set. Then by your >argument, the >test set can consist of only one pattern. Does the performance on this >pattern give >a better indication of the network's performance, or does the >performance on some test >set that reflects the distribution of what you are likely to see in >using the network? >It's obvious that that latter is better. > > Jeff Let me try again. If you have trained the net on the training set and ithe net gets all the answers right according to some arbitrary measure (not necessarily MSE) AND if your test set contains some of the same input/output pairs as are in the training set, then the net will always get those right, and will cause the apparent success on the test set to look better than it really is. Unless, of course, it gets all the test set right, and then the point is probaly moot. The point of having a test set (taken from the same source as the training set) is to make sure the net has found the right (or equivalent) function. On the other hand, if the net does well on the training set but falls apart (scores badly) on the test set, you know that the net has found a nice function that describes the training set well, but *is not the function that actually generated the data*. In that case, the net is worthless. If it scores pretty well on the test set, but not as well as on the training set, then you have probably found a first cousin of the correct function and may be able to coax it to find the correct function. About the second point (size of training/test sets) - some functions have too many possible points to be practical to train on even a small fraction, and you can have very large sets for both training and test. Generally, I put most of my eggs in the training set and keep out 10%-25% for testing. Bear in mind that you may not know what the generating function is, but still have a reasonable estimate on the dimensions. Indeed, to hope to succeed, the input layer must be large enough to be sure that the output is dependant on the input. Bart -- "It's not the thing you fling, the fling's the thing." - Chris Stevens If there's one thing I just can't stand, it's intolerance. *No One* is responsible for my views, I'm a committee. Please do not infer that which I do not imply. hcbarth@afterlife.ncsc.mil