NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / ai / neuraln / 2864 < prev next >

Wrap

Text File | 1992-07-21 | 1.7 KB | 39 lines

Newsgroups: comp.ai.neural-nets Path: sparky!uunet!usc!rpi!batcomputer!cornell!uw-beaver!news.u.washington.edu!milton.u.washington.edu!davisd From: davisd@milton.u.washington.edu (Daniel Davis) Subject: Re: need for unique test sets Message-ID: <1992Jul21.224019.6615@u.washington.edu> Sender: news@u.washington.edu (USENET News System) Organization: University of Washington, Seattle References: <1992Jul19.070433.5896@afterlife.ncsc.mil> <arms.711645136@spedden> Date: Tue, 21 Jul 1992 22:40:19 GMT Lines: 27 I hope I can clear up this debate with a little specificity. A couple guys say that all we need is independant sampling, while someone else seems to think that one should not include the training data in the test set. Independant sampling is in fact all you need, but given the proper context, it is also proper to say that one should not include any of the training data in the test set. Suppose you take 10000 independant samples. You use 5000 as your training set. You would *not* select your test set from all 10000 samples, but instead, only from the 5000 not included in the training set. If you selected test data from a random sampling of all 10000 samples, your test data and your training data would no longer be independant. Of the data you have, only the 5000 previously unselected data correspond to data independant of your training set. In this sense, then, one should not include any of the training data in the test data. However, it is *not* a problem if it happens that some of the 5000 previously unselected data are in fact repeats of the original training data, as it is assumed that the original 10000 were independant samples. Buy Buy -- Dan Davis Univ. of Washington, Dept. of EE, davisd@u.washington.edu