NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / ai / neuraln / 3242 < prev next >

Wrap

Text File | 1992-08-19 | 4.4 KB | 115 lines

Newsgroups: comp.ai.neural-nets Path: sparky!uunet!gumby!destroyer!ubc-cs!alberta!arms From: arms@cs.UAlberta.CA (Bill Armstrong) Subject: Re: Wild values (was Reducing Training time ...) Message-ID: <arms.714256557@spedden> Keywords: back propagation, training, generalisation Sender: news@cs.UAlberta.CA (News Administrator) Nntp-Posting-Host: spedden.cs.ualberta.ca Organization: University of Alberta, Edmonton, Canada References: <arms.714146123@spedden> <36967@sdcc12.ucsd.edu> <arms.714208873@spedden> <37028@sdcc12.ucsd.edu> Date: Wed, 19 Aug 1992 20:35:57 GMT Lines: 101 demers@cs.ucsd.edu (David DeMers) writes: >In article <arms.714208873@spedden> arms@cs.UAlberta.CA (Bill Armstrong) writes: >>Pick any set of integers that contains at least the six points x = >>-2 -1 0 1 2 3, each one with the f(x) value specified below. >>Test on any finite set of integers you like. >Integers don't seem to be very representative of the domain... It isn't important. >Well, I have grasped the idea and I understand how one can *construct* >these examples, but you haven't shown me how you can actually get >this or any similar example by following an optimization of >weights to minimize an objective function like mean squared >error over a set of data... I think I have, but I'll present an argument below. >>If you happened to initialize the system by chance to the given >>weights, which do produce the desired values on the training set, >pretty close to a set of zero measure... True. But the chosen state is stable, which was my point. >the >>BP algorithm would have 0 mean square error on the training set, and >>would not change the weights. In other words, the weights (+ or - 40) >>are stable, and you can reach them. Maybe there are starting points >>from which you can't reach them, but that's a different problem to >>find them. >OK, my claim is that the weights you've given are not >an attractor, or that if they are, >the basin of attraction in weight space is pretty small. >[by attractor I mean a stable point of the dynamical system >consisting of applying BP to the training data for this net, >where small perturbations from the attractor result in the system >converging to it... see, maybe, Wiggins' book or Guckenheimer >and Holmes for more details (many many more details :-) >the basin of attraction is the region within which the >algorithm will result in the attractor being reached (at >least asymptotically ] >I don't have a proof... OK, let's see if I can prove it IS an attractor. The square error on the training set is an infinitely-often differentiable function of the weights, and is >= 0 everywhere. At the weights I gave the error is 0. The first partial derivatives must be 0 at the minimum. Now, using long forgotten calculus arguments, we conclude that the second derivatives typical of a minimum are usually positive (with some unfortunate special cases where they are 0). Hence the Taylor expansion shows a nice little attractive bowl. Sorry,this is not a proof, but you get the idea, and someone who is up on calculus could finish it. >>>The "wildness" here is postulated; I still don't see how it can >>>actually happen on your facts, that the network was trained to >>>zero error on a training set of integer values. >>The "wild" solution is not postulated, it is THE set of weights which >>gives 0 error on the training set. The wild solution is forced upon >>the net by the training data. >I'm sceptical of this fact. There may be some isomorphisms that make the solution not unique, but that is probably not the grounds for your scepticism. Why don't you just run a BP program and see where it converges. If it reaches zero error, then you will have THE solution. ... >>Sure, but a lot of little weights can add up, particularly if values >>derived from them get multiplied by a larger weight. >You can also observe the weights of your network, output only, >and put an upper bound on its value. I think you think you can observe all this in high dimensional spaces, but your lifespan isn't long enough. >... >> Do you always bound your weights in absolute value by >>small numbers? >For initialization, yes. Sounds fine, but what do you do if a weight gets big? Thanks for your comments. -- *************************************************** Prof. William W. Armstrong, Computing Science Dept. University of Alberta; Edmonton, Alberta, Canada T6G 2H1 arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071