home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.ai.neural-nets
- Path: sparky!uunet!wupost!gumby!destroyer!ubc-cs!unixg.ubc.ca!kakwa.ucs.ualberta.ca!alberta!arms
- From: arms@cs.UAlberta.CA (Bill Armstrong)
- Subject: Re: Reducing Training time vs Generalisation
- Message-ID: <arms.714146123@spedden>
- Keywords: back propagation, training, generalisation
- Sender: news@cs.UAlberta.CA (News Administrator)
- Nntp-Posting-Host: spedden.cs.ualberta.ca
- Organization: University of Alberta, Edmonton, Canada
- References: <arms.714014919@spedden> <36931@sdcc12.ucsd.edu> <arms.714091659@spedden> <36944@sdcc12.ucsd.edu>
- Date: Tue, 18 Aug 1992 13:55:23 GMT
- Lines: 119
-
- demers@cs.ucsd.edu (David DeMers) writes:
-
- >In article <arms.714091659@spedden> arms@cs.UAlberta.CA (Bill Armstrong) writes:
- >...
-
- >> ... the truth is that with a least squared error criterion on the training
- >>set, I can get the optimal learned function to create a disaster very
- >>easily.
-
- >No offense, certainly. I guess I just don't understand what you
- >mean by "disaster" nor what you've meant in previous postings
- >about "wild" results...
-
- OK, then it's worth repeating the explanation of how "wild" values can
- be expected to occur once in a while in a trained net. Scott Fahlmann
- pointed out that penalizing large weights can have a beneficial effect.
-
- *****
-
- Here is an example of a backpropagation neural network that has very
- wild behavior at some points not in the training or test sets. It has
- just one input unit ( for variable x), two hidden units with a
- sigmoidal squashing function, and one output unit.
-
- This kind of subnetwork, a "neural net virus" if you like, may exist
- in many of the networks that have been trained to date. It could be
- built into any large BP network, and might hardly change the latter's
- output behavior at all -- except in one small region of the input
- space, where a totally unexpected output could occur that might lead
- to disaster.
-
- I hope this note will be taken as a warning by all persons whose ANS
- are used in safety critical applications in medicine, engineering, the
- military etc. It is also an encouragement to design safety into their
- neural networks.
-
- In order to avoid details of the backpropagation algorithm, we shall
- just use the property that once a BP net has reached an absolute
- minimum of error on the training and test sets, its parameters are not
- changed. So our net will have zero error by design and the BP
- algorithm, applied with infinite precision arithmetic, would not
- change its weights. The issue of getting stuck at a local minimum of
- error does not apply in this case, since it is an absolute minimum.
-
- All the weights in the system remain bounded, and in this case, the
- bound on their absolute values is 40. The output unit's function is
- 40 * H1 + 40 * H2 - 40, where Hi is the output of the i-th hidden unit
- (i = 1, 2). The output unit has no sigmoid, though one could be
- inserted with no loss of generality. The two hidden units have
- outputs of the form 1/(1 + e ^ (w0 + w1*x)) with w0 = -10 and w0 = 30,
- while w1 = + 40 and -40, respectively.
-
- We assume the net has been trained on a subset of integers and also
- tested on a subset of integers. This could be replaced by a finer
- grid, and safety assured (for bounded weights). However, in a
- d-dimensional input space with a quantization to L levels of each
- variable, one would need L ^ d training and test points, which can
- easily be an astronomically large number (e.g. 1000 ^ 10). Hence it
- is not generally feasible to assure safety by testing.
-
- Below is the overall function f(x) produced by the net, which is also
- the specification of what it is *supposed* to do outside the interval
- (0,1). In (0,1) the specification is to be less than 0.002 in
- absolute value.
-
- f(x) = 40 [ 1/( 1 + e^40*(x - 1/4)) + 1/( 1 + e^-40*(x - 3/4)) -1 ]
-
- The largest deviation of our trained network f(x) from 0 on all integers is
-
- f(0) = f(1) = 0.0018...
-
- So f is within 2/1000 of being 0 everywhere on our training and test
- sets. Can we be satisfied with it? No! If we happen to give an input
- of x = 1/2, we get
-
- f(1/2) = - 39.99...
-
- The magnitude of this is over 22000 times larger than anything
- appearing during training and testing, and is way out of spec.
-
- Such unexpected values are likely to be very rare if a lot of testing
- has been done on a trained net, but even then, the potential for
- disaster can still be lurking in the system. Unless neural nets are
- *designed* to be safe, there may be a serious risk involved in using
- them.
-
- The objective of this note is *not* to say "neural nets are bad for
- safety critical applications". On the contrary, I personally believe
- they can be made as safe as any digital circuit, and a lot safer than
- programs. This might make ANS the method of choice for
- safety-critical electronic applications, for example in aircraft
- control systems.
-
- But to achieve that goal, a design methodology must be used which is
- *guaranteed* to lead to a safe network. Such a methodology can be
- based on decomposition of the input space into parts where the
- function synthesized is forced to be monotonic in each variable. For
- adaptive logic networks, this is easy to achieve. The random walk
- technique for encoding real values used in the atree release 2.0
- software available by ftp is not appropriate for enforcing
- monotonicity. Instead, thresholds should be used, which are monotonic
- functions R -> {0,1}. By forcing monotonicity, one can assure that no
- wild values can occur, since all values will be bounded by the values
- at points examined during testing.
-
- For BP networks, I am not sure a safe design methodology can be
- developed. This is not because of the BP algorithm, per se, but
- rather because of the architecture of multilayer networks with
- sigmoids: *all* weights are used in computing *every* output (the
- effect of zero weights having been eliminated). Every output is
- calculated using some negative and some positive weights, giving very
- little hope of control over the values beyond the set of points
- tested.
-
- --
- ***************************************************
- Prof. William W. Armstrong, Computing Science Dept.
- University of Alberta; Edmonton, Alberta, Canada T6G 2H1
- arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071
-