home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.ai.neural-nets
- Path: sparky!uunet!mcsun!dxcern!dxlaa.cern.ch!block
- From: block@dxlaa.cern.ch (Frank Block)
- Subject: Re: Summary of CUPS + new question
- Message-ID: <1992Sep11.130022.14944@dxcern.cern.ch>
- Sender: news@dxcern.cern.ch (USENET News System)
- Reply-To: block@dxlaa.cern.ch (Frank Block)
- Organization: CERN, European Laboratory for Particle Physics, Geneva
- References: <BuCFut.F6t.1@cs.cmu.edu> <arms.716190162@spedden>
- Date: Fri, 11 Sep 1992 13:00:22 GMT
- Lines: 46
-
-
- In article <arms.716190162@spedden>, arms@cs.UAlberta.CA (Bill Armstrong) writes:
- [...text deleted...]
- |> One wants a global minimum. But doing the computations of
- |> gradient descent more accurately, based on an entire epoch, guarantees
- |> that you come to rest at the local minimum of the valley you started
- |> in. So why not do a faster computation that has a chance of kicking
- |> the system out of the valley you are currently in?
-
- It is not that we just have two options:
-
- - update the weights pattern per pattern
- - update the weights after looping over the whole trainset
-
- We can also update the network after some patterns. For instance you have
- a trainset containing 1000 patterns and you update the weights after ten
- patterns. This will prevent the net from running into local minima.
- But also if the weights are updated after presenting the whole data set
- this does not have to imply the net gets stuck into local minima. If you,
- and this is really important, present the patterns in random order this
- is already a good medicine against local minima.
-
- |> I should add that there are other heuristics in the ALN algorithm that
- |> are not gradient-descent type (atree release 2.7 on-line help,
- |> technical notes on the learning algorithm). I.e. some nodes are made
- |> responsible and adaptations are caused to occur even in cases where
- |> that could increase the error. This is quite different from the
- |> approach of adding noise to kick the system out of local minima,
- |> because the kick is given in a promising direction according to the
- |> heuristics.
-
- Perhaps you could explain in a few words which is the idea behind this method.
- How do you kick a network out of a local minimum into a 'promising direction'
- (by which you probably mean the direction where the global minimum is located)?
- How do you know which direction is promising?
-
- Thanks
- Frank Block
-
- ===============================================================================
- Frank Block
- Div. PPE BLOCKF@vxcern.cern.ch
- CERN e-mail:
- CH-1211 Geneve 23 BLOCKF@cernvm.cern.ch
- Switzerland
- ===============================================================================
-