NetNews Usenet Archive 1992 #20

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #20 / NN_1992_20.iso / spool / comp / ai / neuraln / 3428 < prev next >

Wrap

Text File | 1992-09-09 | 2.1 KB | 53 lines

Newsgroups: comp.ai.neural-nets Path: sparky!uunet!mcsun!news.funet.fi!network.jyu.fi!hovila From: hovila@jyu.fi (Ari Hovila) Subject: Adjusting weights in backprop simulators Message-ID: <1992Sep9.094735.15604@jyu.fi> Organization: University of Jyvaskyla, Finland Date: Wed, 9 Sep 1992 09:47:35 GMT Lines: 43 In article 6499 of comp.ai.neural-nets jjb@sequent.com (Jeff Berkowitz) writes: (...some stuff deleted) >Now, the question. While I was trying to debug my backprop simulator, >my wife discovered what appears to be a subtle difference between the >precise description of backprop and several of the "C" implementations >I've picked up via ftp. >In the "original" paper (Rumelhart, Hinton, Williams, "Learning >Internal Representations by Error Propogation", 1986) the backward >pass is described as follows: > ...The first step is to compute delta for each of > the output units. [...] We can then compute the > weight changes for all connections that feed into > the final layer. AFTER this is done, then compute > deltas for all units in the penultimate layer...[etc, > emphasis mine.] ... >as I read the description, I should change the weight FIRST as I back >up, and then use the NEW value in the accumulated error. At least >the Dayhoff description pretty much states this in black and white. Well, this has been bugging me too. I have impelented a bp-simulator with C, and I use the method you suspected was 'an implementation error' (i.e. calculate deltas first and then adjust the weights). I'm really not sure what the main difference between these two approaches is, but at least in a book by James Freeman (Neural Networks, algorithms, applications and programming techniques) it is very clearly stated that you should calculate deltas for hidden layer nodes before adjusting the outgoing weights. Since weight changes are usually small it could be possible that the algorithm works both ways. I doubt that anyone has seen a case where one of these methods works while the other one fails... It would be nice to see other comments on this. Ari Hovila, University of Jyvaskyla, Finland internet e-mail address: hovila@jyu.fi