home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!charon.amdahl.com!pacbell.com!mips!sdd.hp.com!elroy.jpl.nasa.gov!ames!network.ucsd.edu!sdcc12!cs!demers
- From: demers@cs.ucsd.edu (David DeMers)
- Newsgroups: comp.ai.neural-nets
- Subject: Re: Wild values (was Reducing Training time ...)
- Keywords: back propagation, training, generalisation
- Message-ID: <37028@sdcc12.ucsd.edu>
- Date: 19 Aug 92 17:26:31 GMT
- References: <arms.714146123@spedden> <36967@sdcc12.ucsd.edu> <arms.714208873@spedden>
- Sender: news@sdcc12.ucsd.edu
- Organization: =CSE Dept., U.C. San Diego
- Lines: 135
- Nntp-Posting-Host: beowulf.ucsd.edu
-
- In article <arms.714208873@spedden> arms@cs.UAlberta.CA (Bill Armstrong) writes:
- >Pick any set of integers that contains at least the six points x =
- >-2 -1 0 1 2 3, each one with the f(x) value specified below.
- >Test on any finite set of integers you like.
-
- Integers don't seem to be very representative of the domain...
-
- ...
-
- >>This is not simply a pathological example, it is completely
- >>absurd.
-
- >You simply haven't grasped it yet. This kind of little "absurd"
- >example is going to show many people how dangerous it is to use the
- >usual approach to neural networks. When a safety-critical system
- >blows up because you neglected some wild output of your neural net, it
- >will be too late to go back and try to understand the example.
-
- >Anyway, it is not a pathological example. Once you get the idea, you
- >can construct lots of examples.
-
- Well, I have grasped the idea and I understand how one can *construct*
- these examples, but you haven't shown me how you can actually get
- this or any similar example by following an optimization of
- weights to minimize an objective function like mean squared
- error over a set of data...
-
- It's only when you reach that point
- >that you can begin to think about preventing wild values. Sorry,
- >calling my little example "absurd" won't convince people who have a
- >lot to lose from a misbehaved system. If they are smart, they will
- >want to see proof that a wild value can't cause a problem. Are you
- >ready to supply a proof? I don't think so, because you still don't
- >grasp the problem.
-
- You seem to be calling "the problem" proving that a particular
- net plucked out of the air will perform according so
- some spec...
-
- ...
-
- >I have had backprop converge on this kind of pathological example,
- >from some not particularly carefully chosen starting state. If the
- >f-values are small, I can see there is a problem with a real BP net,
- >but the argument is supposed to be mathematical, so numerical accuracy
- >is not a problem.
-
- >If you happened to initialize the system by chance to the given
- >weights, which do produce the desired values on the training set,
-
- pretty close to a set of zero measure...
-
- the
- >BP algorithm would have 0 mean square error on the training set, and
- >would not change the weights. In other words, the weights (+ or - 40)
- >are stable, and you can reach them. Maybe there are starting points
- >from which you can't reach them, but that's a different problem to
- >find them.
-
- OK, my claim is that the weights you've given are not
- an attractor, or that if they are,
- the basin of attraction in weight space is pretty small.
-
- [by attractor I mean a stable point of the dynamical system
- consisting of applying BP to the training data for this net,
- where small perturbations from the attractor result in the system
- converging to it... see, maybe, Wiggins' book or Guckenheimer
- and Holmes for more details (many many more details :-)
- the basin of attraction is the region within which the
- algorithm will result in the attractor being reached (at
- least asymptotically ]
-
- I don't have a proof...
-
- >>The "wildness" here is postulated; I still don't see how it can
- >>actually happen on your facts, that the network was trained to
- >>zero error on a training set of integer values.
-
- >The "wild" solution is not postulated, it is THE set of weights which
- >gives 0 error on the training set. The wild solution is forced upon
- >the net by the training data.
-
- I'm sceptical of this fact.
-
- The use of integers for training and testing
- >and the fact that they are uniformly spaced is also not critical.
-
- ...
-
- >>In the neural network framework, Mike Jordan and Robert Jacobs
- >>are working on a generalization of modular architecture of
- >>Jacobs, Jordan, Nowlan & Hinton, which recursively splits the
- >>input space into nested regions and "learns" a mapping within
- >>each region.
-
- >Great. Do they use monotonicity, or a scheme which allows them to get
- >tight bounds on *all* outputs, so they can satisfy a "spec" if we
- >could agree on one?
-
-
- It's more akin to the Bayesian methods.
-
- ...
-
- >Sure, but a lot of little weights can add up, particularly if values
- >derived from them get multiplied by a larger weight.
-
- You can also observe the weights of your network, output only,
- and put an upper bound on its value.
-
-
- ...
-
- >The situation you describe, where you are always in the linear region
- >of all sigmoids sounds *very* undesirable.
-
- It is undesireable to *end up* there, but there is a lot
- of evidence that it's a good place to start in weight-space.
-
- The output should benefit by
- >some signals getting very attenuated in effect by being near the flat
- >parts of sigmoids.
-
- Sure, otherwise there are no non-linearities...
-
- > Do you always bound your weights in absolute value by
- >small numbers?
-
- For initialization, yes.
-
- --
- Dave DeMers ddemers@UCSD demers@cs.ucsd.edu
- Computer Science & Engineering C-014 demers%cs@ucsd.bitnet
- UC San Diego ...!ucsd!cs!demers
- La Jolla, CA 92093-0114 (619) 534-0688, or -8187, FAX: (619) 534-7029
-