NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / ai / neuraln / 4348 < prev next >

Wrap

Text File | 1992-11-22 | 3.2 KB | 76 lines

Newsgroups: comp.ai.neural-nets Path: sparky!uunet!spool.mu.edu!sol.ctr.columbia.edu!destroyer!cs.ubc.ca!alberta!arms From: arms@cs.UAlberta.CA (Bill Armstrong) Subject: Re: How to train a lifeless network (of "silicon atoms")? Message-ID: <arms.722488897@spedden> Sender: news@cs.UAlberta.CA (News Administrator) Nntp-Posting-Host: spedden.cs.ualberta.ca Organization: University of Alberta, Edmonton, Canada References: <1992Nov21.002654.13198@news.columbia.edu> <1992Nov22.182325.24185@dxcern.cern.ch> Date: Mon, 23 Nov 1992 03:21:37 GMT Lines: 63 block@dxlaa.cern.ch (Frank Block) writes: >In article <1992Nov21.002654.13198@news.columbia.edu>, rs69@cunixb.cc.columbia.edu (Rong Shen) writes: >|> Please allow me to ask you this childish question: >|> >|> Suppose you have a neural network and you want to train it to >|> perform a task; for the moment, let's say the task is to recognize >|> handwriting. Now suppose the network has recognized the word "hello," >|> and the weight in the synapse between neurodes (neurons) X and Y is k. >|> If you proceed to train the network to recognize the word "goodbye" >|> (by back propagation, or whatever algorithms), and since all the >|> neurodes are connected in some way (through some interneurons, maybe), >|> the synaptic weight between X and Y is likely to change from k to some >|> other number; similarly, the weights in other synapses will change. >|> Therefore, it is extremely likely that one training session will erase >|> the efforts of previous sessions. >|> >|> My question is, What engineering tricks shall we use to >|> overcome this apparent difficulty? >|> >|> Thanks. >|> >|> -- >|> rs69@cunixb.cc.columbia.edu >|> >-- >What you normally do during training is to present (taking you example) the >words 'hello' and 'goodbye' alternatively. You should not train the net first >just on one and then, when it has learned to recognize it, on the other. >The training is a statistical process which in the end (let's hope) converges >to a good set of weights (a compromise which recognizes all patterns in an >optimal way). >The engineering trick is mostly the so-called 'gradient descent' (in backprop). >This moves you current weight vector always in a direction which decreases the >network error measure. >Hope this helps a bit >Frank I attended a tutorial once by Bernard Widrow, and he referred to the "least disturbance principle". The idea is to correct an error in such a way that the overall state of the net is the least perturbed. This is another way of looking at backprop. Not only does it reduce the error (in theory anyway), but it does so by favoring changes of those weights which have the greatest effect on the error. Hence it can change those by the least amount to achieve a given amount of error correction. Widrow's profound insight into adaptation is shown in this simple principle of his. It has helped me a lot in thinking about adaptation algorithms for ALNs too, but the scope of its applicability seems much broader. Bill -- *************************************************** Prof. William W. Armstrong, Computing Science Dept. University of Alberta; Edmonton, Alberta, Canada T6G 2H1 arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071