NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / ai / neuraln / 2891 < prev next >

Wrap

Internet Message Format | 1992-07-23 | 2.0 KB

Path: sparky!uunet!snorkelwacker.mit.edu!ai-lab!sun-of-smokey!marcus From: marcus@sun-of-smokey.NoSubdomain.NoDomain (Jeff Marcus) Newsgroups: comp.ai.neural-nets Subject: Re: need for unique test sets Message-ID: <25761@life.ai.mit.edu> Date: 23 Jul 92 14:07:31 GMT References: <1992Jul19.070433.5896@afterlife.ncsc.mil> <arms.711645136@spedden> <1992Jul21.224019.6615@u.washington.edu> Sender: news@ai.mit.edu Organization: MIT/LCS Spoken Language Systems Lines: 35 In article <1992Jul21.224019.6615@u.washington.edu>, davisd@milton.u.washington.edu (Daniel Davis) writes: |> I hope I can clear up this debate with a little specificity. |> |> A couple guys say that all we need is independant sampling, while |> someone else seems to think that one should not include the training |> data in the test set. |> |> Independant sampling is in fact all you need, but given the proper |> context, it is also proper to say that one should not include any of |> the training data in the test set. |> |> Suppose you take 10000 independant samples. You use 5000 as your |> training set. You would *not* select your test set from all 10000 |> samples, but instead, only from the 5000 not included in the |> training |> set. If you selected test data from a random sampling of all 10000 |> samples, your test data and your training data would no longer be |> independant. Of the data you have, only the 5000 previously |> unselected |> data correspond to data independant of your training set. In this |> sense, then, one should not include any of the training data in the |> test data. |> |> However, it is *not* a problem if it happens that some of the 5000 |> previously unselected data are in fact repeats of the original |> training data, as it is assumed that the original 10000 were |> independant samples. |> |> Buy Buy -- Dan Davis |> Univ. of Washington, Dept. of EE, davisd@u.washington.edu Exactly. I should have been this specific in making my argument. It might have saved some bandwidth. Jeff