NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / ai / neuraln / 3239 < prev next >

Wrap

Internet Message Format | 1992-08-19 | 5.4 KB

Path: sparky!uunet!charon.amdahl.com!pacbell.com!mips!sdd.hp.com!elroy.jpl.nasa.gov!ames!network.ucsd.edu!sdcc12!cs!demers From: demers@cs.ucsd.edu (David DeMers) Newsgroups: comp.ai.neural-nets Subject: Re: Wild values (was Reducing Training time ...) Keywords: back propagation, training, generalisation Message-ID: <37028@sdcc12.ucsd.edu> Date: 19 Aug 92 17:26:31 GMT References: <arms.714146123@spedden> <36967@sdcc12.ucsd.edu> <arms.714208873@spedden> Sender: news@sdcc12.ucsd.edu Organization: =CSE Dept., U.C. San Diego Lines: 135 Nntp-Posting-Host: beowulf.ucsd.edu In article <arms.714208873@spedden> arms@cs.UAlberta.CA (Bill Armstrong) writes: >Pick any set of integers that contains at least the six points x = >-2 -1 0 1 2 3, each one with the f(x) value specified below. >Test on any finite set of integers you like. Integers don't seem to be very representative of the domain... ... >>This is not simply a pathological example, it is completely >>absurd. >You simply haven't grasped it yet. This kind of little "absurd" >example is going to show many people how dangerous it is to use the >usual approach to neural networks. When a safety-critical system >blows up because you neglected some wild output of your neural net, it >will be too late to go back and try to understand the example. >Anyway, it is not a pathological example. Once you get the idea, you >can construct lots of examples. Well, I have grasped the idea and I understand how one can *construct* these examples, but you haven't shown me how you can actually get this or any similar example by following an optimization of weights to minimize an objective function like mean squared error over a set of data... It's only when you reach that point >that you can begin to think about preventing wild values. Sorry, >calling my little example "absurd" won't convince people who have a >lot to lose from a misbehaved system. If they are smart, they will >want to see proof that a wild value can't cause a problem. Are you >ready to supply a proof? I don't think so, because you still don't >grasp the problem. You seem to be calling "the problem" proving that a particular net plucked out of the air will perform according so some spec... ... >I have had backprop converge on this kind of pathological example, >from some not particularly carefully chosen starting state. If the >f-values are small, I can see there is a problem with a real BP net, >but the argument is supposed to be mathematical, so numerical accuracy >is not a problem. >If you happened to initialize the system by chance to the given >weights, which do produce the desired values on the training set, pretty close to a set of zero measure... the >BP algorithm would have 0 mean square error on the training set, and >would not change the weights. In other words, the weights (+ or - 40) >are stable, and you can reach them. Maybe there are starting points >from which you can't reach them, but that's a different problem to >find them. OK, my claim is that the weights you've given are not an attractor, or that if they are, the basin of attraction in weight space is pretty small. [by attractor I mean a stable point of the dynamical system consisting of applying BP to the training data for this net, where small perturbations from the attractor result in the system converging to it... see, maybe, Wiggins' book or Guckenheimer and Holmes for more details (many many more details :-) the basin of attraction is the region within which the algorithm will result in the attractor being reached (at least asymptotically ] I don't have a proof... >>The "wildness" here is postulated; I still don't see how it can >>actually happen on your facts, that the network was trained to >>zero error on a training set of integer values. >The "wild" solution is not postulated, it is THE set of weights which >gives 0 error on the training set. The wild solution is forced upon >the net by the training data. I'm sceptical of this fact. The use of integers for training and testing >and the fact that they are uniformly spaced is also not critical. ... >>In the neural network framework, Mike Jordan and Robert Jacobs >>are working on a generalization of modular architecture of >>Jacobs, Jordan, Nowlan & Hinton, which recursively splits the >>input space into nested regions and "learns" a mapping within >>each region. >Great. Do they use monotonicity, or a scheme which allows them to get >tight bounds on *all* outputs, so they can satisfy a "spec" if we >could agree on one? It's more akin to the Bayesian methods. ... >Sure, but a lot of little weights can add up, particularly if values >derived from them get multiplied by a larger weight. You can also observe the weights of your network, output only, and put an upper bound on its value. ... >The situation you describe, where you are always in the linear region >of all sigmoids sounds *very* undesirable. It is undesireable to *end up* there, but there is a lot of evidence that it's a good place to start in weight-space. The output should benefit by >some signals getting very attenuated in effect by being near the flat >parts of sigmoids. Sure, otherwise there are no non-linearities... > Do you always bound your weights in absolute value by >small numbers? For initialization, yes. -- Dave DeMers ddemers@UCSD demers@cs.ucsd.edu Computer Science & Engineering C-014 demers%cs@ucsd.bitnet UC San Diego ...!ucsd!cs!demers La Jolla, CA 92093-0114 (619) 534-0688, or -8187, FAX: (619) 534-7029