home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.ai.neural-nets
- Path: sparky!uunet!gumby!destroyer!ubc-cs!alberta!arms
- From: arms@cs.UAlberta.CA (Bill Armstrong)
- Subject: Re: Wild values (was Reducing Training time ...)
- Message-ID: <arms.714256557@spedden>
- Keywords: back propagation, training, generalisation
- Sender: news@cs.UAlberta.CA (News Administrator)
- Nntp-Posting-Host: spedden.cs.ualberta.ca
- Organization: University of Alberta, Edmonton, Canada
- References: <arms.714146123@spedden> <36967@sdcc12.ucsd.edu> <arms.714208873@spedden> <37028@sdcc12.ucsd.edu>
- Date: Wed, 19 Aug 1992 20:35:57 GMT
- Lines: 101
-
- demers@cs.ucsd.edu (David DeMers) writes:
-
- >In article <arms.714208873@spedden> arms@cs.UAlberta.CA (Bill Armstrong) writes:
- >>Pick any set of integers that contains at least the six points x =
- >>-2 -1 0 1 2 3, each one with the f(x) value specified below.
- >>Test on any finite set of integers you like.
-
- >Integers don't seem to be very representative of the domain...
-
- It isn't important.
-
- >Well, I have grasped the idea and I understand how one can *construct*
- >these examples, but you haven't shown me how you can actually get
- >this or any similar example by following an optimization of
- >weights to minimize an objective function like mean squared
- >error over a set of data...
-
- I think I have, but I'll present an argument below.
-
- >>If you happened to initialize the system by chance to the given
- >>weights, which do produce the desired values on the training set,
-
- >pretty close to a set of zero measure...
-
- True. But the chosen state is stable, which was my point.
-
- >the
- >>BP algorithm would have 0 mean square error on the training set, and
- >>would not change the weights. In other words, the weights (+ or - 40)
- >>are stable, and you can reach them. Maybe there are starting points
- >>from which you can't reach them, but that's a different problem to
- >>find them.
-
- >OK, my claim is that the weights you've given are not
- >an attractor, or that if they are,
- >the basin of attraction in weight space is pretty small.
-
- >[by attractor I mean a stable point of the dynamical system
- >consisting of applying BP to the training data for this net,
- >where small perturbations from the attractor result in the system
- >converging to it... see, maybe, Wiggins' book or Guckenheimer
- >and Holmes for more details (many many more details :-)
- >the basin of attraction is the region within which the
- >algorithm will result in the attractor being reached (at
- >least asymptotically ]
-
- >I don't have a proof...
-
- OK, let's see if I can prove it IS an attractor.
-
- The square error on the training set is an infinitely-often
- differentiable function of the weights, and is >= 0 everywhere. At
- the weights I gave the error is 0. The first partial derivatives
- must be 0 at the minimum. Now, using long forgotten calculus
- arguments, we conclude that the second derivatives typical of a
- minimum are usually positive (with some unfortunate special cases
- where they are 0). Hence the Taylor expansion shows a nice little
- attractive bowl. Sorry,this is not a proof, but you get the idea,
- and someone who is up on calculus could finish it.
-
- >>>The "wildness" here is postulated; I still don't see how it can
- >>>actually happen on your facts, that the network was trained to
- >>>zero error on a training set of integer values.
-
- >>The "wild" solution is not postulated, it is THE set of weights which
- >>gives 0 error on the training set. The wild solution is forced upon
- >>the net by the training data.
-
- >I'm sceptical of this fact.
-
- There may be some isomorphisms that make the solution not unique, but
- that is probably not the grounds for your scepticism. Why don't you
- just run a BP program and see where it converges. If it reaches
- zero error, then you will have THE solution.
-
- ...
-
- >>Sure, but a lot of little weights can add up, particularly if values
- >>derived from them get multiplied by a larger weight.
-
- >You can also observe the weights of your network, output only,
- >and put an upper bound on its value.
-
- I think you think you can observe all this in high dimensional spaces, but
- your lifespan isn't long enough.
-
- >...
-
- >> Do you always bound your weights in absolute value by
- >>small numbers?
-
- >For initialization, yes.
-
- Sounds fine, but what do you do if a weight gets big?
-
- Thanks for your comments.
- --
- ***************************************************
- Prof. William W. Armstrong, Computing Science Dept.
- University of Alberta; Edmonton, Alberta, Canada T6G 2H1
- arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071
-