NetNews Usenet Archive 1992 #20

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #20 / NN_1992_20.iso / spool / comp / ai / neuraln / 3453 < prev next >

Wrap

Text File | 1992-09-10 | 2.6 KB | 59 lines

Newsgroups: comp.ai.neural-nets Path: sparky!uunet!mcsun!dxcern!dxlaa.cern.ch!block From: block@dxlaa.cern.ch (Frank Block) Subject: Re: Summary of CUPS + new question Message-ID: <1992Sep11.130022.14944@dxcern.cern.ch> Sender: news@dxcern.cern.ch (USENET News System) Reply-To: block@dxlaa.cern.ch (Frank Block) Organization: CERN, European Laboratory for Particle Physics, Geneva References: <BuCFut.F6t.1@cs.cmu.edu> <arms.716190162@spedden> Date: Fri, 11 Sep 1992 13:00:22 GMT Lines: 46 In article <arms.716190162@spedden>, arms@cs.UAlberta.CA (Bill Armstrong) writes: [...text deleted...] |> One wants a global minimum. But doing the computations of |> gradient descent more accurately, based on an entire epoch, guarantees |> that you come to rest at the local minimum of the valley you started |> in. So why not do a faster computation that has a chance of kicking |> the system out of the valley you are currently in? It is not that we just have two options: - update the weights pattern per pattern - update the weights after looping over the whole trainset We can also update the network after some patterns. For instance you have a trainset containing 1000 patterns and you update the weights after ten patterns. This will prevent the net from running into local minima. But also if the weights are updated after presenting the whole data set this does not have to imply the net gets stuck into local minima. If you, and this is really important, present the patterns in random order this is already a good medicine against local minima. |> I should add that there are other heuristics in the ALN algorithm that |> are not gradient-descent type (atree release 2.7 on-line help, |> technical notes on the learning algorithm). I.e. some nodes are made |> responsible and adaptations are caused to occur even in cases where |> that could increase the error. This is quite different from the |> approach of adding noise to kick the system out of local minima, |> because the kick is given in a promising direction according to the |> heuristics. Perhaps you could explain in a few words which is the idea behind this method. How do you kick a network out of a local minimum into a 'promising direction' (by which you probably mean the direction where the global minimum is located)? How do you know which direction is promising? Thanks Frank Block =============================================================================== Frank Block Div. PPE BLOCKF@vxcern.cern.ch CERN e-mail: CH-1211 Geneve 23 BLOCKF@cernvm.cern.ch Switzerland ===============================================================================