NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / ai / neuraln / 2854 < prev next >

Wrap

Internet Message Format | 1992-07-21 | 1.5 KB

Path: sparky!uunet!zaphod.mps.ohio-state.edu!magnus.acs.ohio-state.edu!usenet.ins.cwru.edu!agate!overload.lbl.gov!lll-winken!tazdevil!henrik From: henrik@mpci.llnl.gov (Henrik Klagges) Newsgroups: comp.ai.neural-nets Subject: How to correctly measure time series generalization (?) Message-ID: <?.711737203@tazdevil> Date: 21 Jul 92 16:46:43 GMT Sender: usenet@lll-winken.LLNL.GOV Lines: 30 Nntp-Posting-Host: tazdevil.llnl.gov Suppose a time series is available, from t=0 to t=99. How are these datapoints to be partitioned into training & test sets ? (It is assumed that a training/ test vector at 't' is made out of (t-k, t-k+1, ..., t) as inputs and (t+1) as target output). a) Use 0->X for training, (X+1)->99 for testing; b) Use n randomly selected t's for training, and the rest for testing. 'a' is straightforward extrapolation, but fails if the time series 'window' is too small to capture all major cycles in the series (a cycle not visible from 0-X shows up from X+1 -> 99). This is e.g. likely the case with sunspot data. If the network's approximation captures the complete problem mechanics, though, a solution of 'a' works satisfactory for any time 't'. 'b' is a bit of a cheat - it is interpolation. However, for practical purposes like forecasting only the next ('99+1') sunspot, it may be more reliable than 'a'. Comments, please. Cheers, Henrik massively parallel group at Lawrence Livermore IBM Research physics group Munich -- Cheers, Henrik MPCI at LLNL IBM Research