home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.ai.neural-nets
- Path: sparky!uunet!usc!rpi!batcomputer!cornell!uw-beaver!news.u.washington.edu!milton.u.washington.edu!davisd
- From: davisd@milton.u.washington.edu (Daniel Davis)
- Subject: Re: need for unique test sets
- Message-ID: <1992Jul21.224019.6615@u.washington.edu>
- Sender: news@u.washington.edu (USENET News System)
- Organization: University of Washington, Seattle
- References: <1992Jul19.070433.5896@afterlife.ncsc.mil> <arms.711645136@spedden>
- Date: Tue, 21 Jul 1992 22:40:19 GMT
- Lines: 27
-
- I hope I can clear up this debate with a little specificity.
-
- A couple guys say that all we need is independant sampling, while
- someone else seems to think that one should not include the training
- data in the test set.
-
- Independant sampling is in fact all you need, but given the proper
- context, it is also proper to say that one should not include any of
- the training data in the test set.
-
- Suppose you take 10000 independant samples. You use 5000 as your
- training set. You would *not* select your test set from all 10000
- samples, but instead, only from the 5000 not included in the training
- set. If you selected test data from a random sampling of all 10000
- samples, your test data and your training data would no longer be
- independant. Of the data you have, only the 5000 previously unselected
- data correspond to data independant of your training set. In this
- sense, then, one should not include any of the training data in the
- test data.
-
- However, it is *not* a problem if it happens that some of the 5000
- previously unselected data are in fact repeats of the original
- training data, as it is assumed that the original 10000 were
- independant samples.
-
- Buy Buy -- Dan Davis
- Univ. of Washington, Dept. of EE, davisd@u.washington.edu
-