NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / ai / neuraln / 3219 < prev next >

Wrap

Text File | 1992-08-18 | 6.1 KB | 133 lines

Newsgroups: comp.ai.neural-nets Path: sparky!uunet!wupost!gumby!destroyer!ubc-cs!unixg.ubc.ca!kakwa.ucs.ualberta.ca!alberta!arms From: arms@cs.UAlberta.CA (Bill Armstrong) Subject: Re: Reducing Training time vs Generalisation Message-ID: <arms.714146123@spedden> Keywords: back propagation, training, generalisation Sender: news@cs.UAlberta.CA (News Administrator) Nntp-Posting-Host: spedden.cs.ualberta.ca Organization: University of Alberta, Edmonton, Canada References: <arms.714014919@spedden> <36931@sdcc12.ucsd.edu> <arms.714091659@spedden> <36944@sdcc12.ucsd.edu> Date: Tue, 18 Aug 1992 13:55:23 GMT Lines: 119 demers@cs.ucsd.edu (David DeMers) writes: >In article <arms.714091659@spedden> arms@cs.UAlberta.CA (Bill Armstrong) writes: >... >> ... the truth is that with a least squared error criterion on the training >>set, I can get the optimal learned function to create a disaster very >>easily. >No offense, certainly. I guess I just don't understand what you >mean by "disaster" nor what you've meant in previous postings >about "wild" results... OK, then it's worth repeating the explanation of how "wild" values can be expected to occur once in a while in a trained net. Scott Fahlmann pointed out that penalizing large weights can have a beneficial effect. ***** Here is an example of a backpropagation neural network that has very wild behavior at some points not in the training or test sets. It has just one input unit ( for variable x), two hidden units with a sigmoidal squashing function, and one output unit. This kind of subnetwork, a "neural net virus" if you like, may exist in many of the networks that have been trained to date. It could be built into any large BP network, and might hardly change the latter's output behavior at all -- except in one small region of the input space, where a totally unexpected output could occur that might lead to disaster. I hope this note will be taken as a warning by all persons whose ANS are used in safety critical applications in medicine, engineering, the military etc. It is also an encouragement to design safety into their neural networks. In order to avoid details of the backpropagation algorithm, we shall just use the property that once a BP net has reached an absolute minimum of error on the training and test sets, its parameters are not changed. So our net will have zero error by design and the BP algorithm, applied with infinite precision arithmetic, would not change its weights. The issue of getting stuck at a local minimum of error does not apply in this case, since it is an absolute minimum. All the weights in the system remain bounded, and in this case, the bound on their absolute values is 40. The output unit's function is 40 * H1 + 40 * H2 - 40, where Hi is the output of the i-th hidden unit (i = 1, 2). The output unit has no sigmoid, though one could be inserted with no loss of generality. The two hidden units have outputs of the form 1/(1 + e ^ (w0 + w1*x)) with w0 = -10 and w0 = 30, while w1 = + 40 and -40, respectively. We assume the net has been trained on a subset of integers and also tested on a subset of integers. This could be replaced by a finer grid, and safety assured (for bounded weights). However, in a d-dimensional input space with a quantization to L levels of each variable, one would need L ^ d training and test points, which can easily be an astronomically large number (e.g. 1000 ^ 10). Hence it is not generally feasible to assure safety by testing. Below is the overall function f(x) produced by the net, which is also the specification of what it is *supposed* to do outside the interval (0,1). In (0,1) the specification is to be less than 0.002 in absolute value. f(x) = 40 [ 1/( 1 + e^40*(x - 1/4)) + 1/( 1 + e^-40*(x - 3/4)) -1 ] The largest deviation of our trained network f(x) from 0 on all integers is f(0) = f(1) = 0.0018... So f is within 2/1000 of being 0 everywhere on our training and test sets. Can we be satisfied with it? No! If we happen to give an input of x = 1/2, we get f(1/2) = - 39.99... The magnitude of this is over 22000 times larger than anything appearing during training and testing, and is way out of spec. Such unexpected values are likely to be very rare if a lot of testing has been done on a trained net, but even then, the potential for disaster can still be lurking in the system. Unless neural nets are *designed* to be safe, there may be a serious risk involved in using them. The objective of this note is *not* to say "neural nets are bad for safety critical applications". On the contrary, I personally believe they can be made as safe as any digital circuit, and a lot safer than programs. This might make ANS the method of choice for safety-critical electronic applications, for example in aircraft control systems. But to achieve that goal, a design methodology must be used which is *guaranteed* to lead to a safe network. Such a methodology can be based on decomposition of the input space into parts where the function synthesized is forced to be monotonic in each variable. For adaptive logic networks, this is easy to achieve. The random walk technique for encoding real values used in the atree release 2.0 software available by ftp is not appropriate for enforcing monotonicity. Instead, thresholds should be used, which are monotonic functions R -> {0,1}. By forcing monotonicity, one can assure that no wild values can occur, since all values will be bounded by the values at points examined during testing. For BP networks, I am not sure a safe design methodology can be developed. This is not because of the BP algorithm, per se, but rather because of the architecture of multilayer networks with sigmoids: *all* weights are used in computing *every* output (the effect of zero weights having been eliminated). Every output is calculated using some negative and some positive weights, giving very little hope of control over the values beyond the set of points tested. -- *************************************************** Prof. William W. Armstrong, Computing Science Dept. University of Alberta; Edmonton, Alberta, Canada T6G 2H1 arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071