NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / ai / neuraln / 3236 < prev next >

Wrap

Text File | 1992-08-19 | 1.6 KB | 33 lines

Newsgroups: comp.ai.neural-nets Path: sparky!uunet!brunix!cs.brown.edu!mpp From: mpp@cns.brown.edu (Michael P. Perrone) Subject: Re: Reducing Training time vs Generalisation Message-ID: <1992Aug19.172222.10441@cs.brown.edu> Keywords: back propagation, training, generalisation Sender: mpp@cs.brown.edu (Michael P. Perrone) Organization: Center for Neural Science, Brown University References: <arms.714091659@spedden> <36944@sdcc12.ucsd.edu> <arms.714146123@spedden> <36967@sdcc12.ucsd.edu> <1992Aug18.231650.27663@cs.brown.edu> <arms.714214353@spedden> Date: Wed, 19 Aug 1992 17:22:22 GMT Lines: 20 If we define a training set Z = {(xi,f(xi))} for i=1,..,n and we demand that our backprop network be able to fit f(.) exactly on Z, then clearly, the global minimum is to fit f(.) exactly. If we further demand that for some range of x not in Z, the network catastrophically misfits the true function, then what we are really saying is that our choice of model was bad. Clearly, this is a problem. But it is not just a problem with backprop. It is a fundamental problem whenever we are forced to choose a model. Unless we have some a priori knowledge about the problem, our model choice is always open to pathological solutions. The above problem is an example of the utility a priori smoothness assumptions. For example instead of using backprop, we could have chosen a kernel estimator using delta fuctions (i.e. no smoothing). In this case, our error over Z would again be zero but our bias to the training set can again lead us to "wild" solutions (which is just another way of saying high variance).