home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.ai.neural-nets
- Path: sparky!uunet!brunix!cs.brown.edu!mpp
- From: mpp@cns.brown.edu (Michael P. Perrone)
- Subject: Re: Reducing Training time vs Generalisation
- Message-ID: <1992Aug19.172222.10441@cs.brown.edu>
- Keywords: back propagation, training, generalisation
- Sender: mpp@cs.brown.edu (Michael P. Perrone)
- Organization: Center for Neural Science, Brown University
- References: <arms.714091659@spedden> <36944@sdcc12.ucsd.edu> <arms.714146123@spedden> <36967@sdcc12.ucsd.edu> <1992Aug18.231650.27663@cs.brown.edu> <arms.714214353@spedden>
- Date: Wed, 19 Aug 1992 17:22:22 GMT
- Lines: 20
-
- If we define a training set Z = {(xi,f(xi))} for i=1,..,n
- and we demand that our backprop network be able to fit f(.)
- exactly on Z, then clearly, the global minimum is to fit
- f(.) exactly.
-
- If we further demand that for some range of x not in Z, the
- network catastrophically misfits the true function, then
- what we are really saying is that our choice of model was bad.
-
- Clearly, this is a problem. But it is not just a problem with
- backprop. It is a fundamental problem whenever we are forced to
- choose a model. Unless we have some a priori knowledge about the
- problem, our model choice is always open to pathological solutions.
-
- The above problem is an example of the utility a priori smoothness
- assumptions. For example instead of using backprop, we could have
- chosen a kernel estimator using delta fuctions (i.e. no smoothing).
- In this case, our error over Z would again be zero but our bias to
- the training set can again lead us to "wild" solutions (which is
- just another way of saying high variance).
-