home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!elroy.jpl.nasa.gov!usc!cs.utexas.edu!uwm.edu!ogicse!das-news.harvard.edu!cantaloupe.srv.cs.cmu.edu!crabapple.srv.cs.cmu.edu!news
- From: sef@sef-pmax.slisp.cs.cmu.edu
- Newsgroups: comp.ai.neural-nets
- Subject: Re: will Cascade Correlation work in stochastic mode?
- Message-ID: <C0Gw28.M9E.1@cs.cmu.edu>
- Date: 7 Jan 93 04:49:16 GMT
- Article-I.D.: cs.C0Gw28.M9E.1
- Sender: news@cs.cmu.edu (Usenet News System)
- Organization: School of Computer Science, Carnegie Mellon
- Lines: 76
- Nntp-Posting-Host: sef-pmax.slisp.cs.cmu.edu
-
-
- From: ra@cs.brown.edu (Ronny Ashar)
-
- However, I would prefer a more robust algorithm. I was looking at Fahlman's
- Cascade Correlation. My impression is that Cascor needs epoch training only;
- it could be modified to work in stochastic mode, but, in that case, it will
- end up creating huge nets with redundant units. Is that correct?
-
- Good question. The answer is a bit complicated. This probably won't make
- much sense to people who don't already understand the Cascor algorithm in
- some detail...
-
- There is nothing inherently batch-oriented in the basic structure of
- Cascor. However, the Cascor code that I distribute, and that I run myself,
- uses Quickprop for updating the weights both in the candidate-training
- phase and in the output-training phase. Quickprop, in its current form, is
- inherently a batch-update algorithm: you have to run the same batch of
- training examples through the system multiple times to get the speedup it
- offers.
-
- That batch does not necessarily have to include the whole original
- training set, however. You want enough examples to get a good, stable
- estimate of the gradient, but not a lot more than that. It is possible to
- change the training set used in quickprop from time to time, but whenever
- you do that you should zero out the previous-slope and previous-delta
- values to prevent problems. So if the problem is that your training data
- set is too large and redundant, a possible solution is to choose a smaller
- batch size, train on that, and switch the batches occasionally.
-
- (Martin Moller has a nice paper on choosing the batch size for his Scaled
- Conjugate Gradient algorithm in Neural Networks for Signal Processing 2,
- IEEE Press, 1992. Similar ideas could be used with Quickprop or Cascor.)
-
- If you want true online updating, without storing up a batch of examples,
- you can use the structure of Cascor, but with the Quickprop updating
- replaced by stochastic backprop. If you do that, you cannot use the trick
- of cacheing the errors and unit values for all the cases in an epoch. Each
- new sample must be allowed to propagate through the active net. These
- changes will cost you a lot of speed, but you may get that back (and more)
- for highly redundant data sets in which per-epoch updating is terribly
- inefficient. If you get impatient and quit the training phases too early,
- before backprop has really converged, then Cascor will indeed create too
- many units and generalize poorly.
-
- When you are using stochastic backprop to do the updates, the training
- never reaches a truly quiescent state. The error keeps bouncing up and
- down as the individual samples go by. So you have to modify the quiescence
- tests that are used to terminate the training phases. You need to take an
- average over many samples, and declare the training phase over when there
- is no change in the average error for some period of time. Even so, there
- is the danger of stopping the training when the net is in a less than
- optimal state. It is probably best to gradually reduce the training rate
- to reduce these fluctuations.
-
- Candidate training and output training can, to some extent, be overlapped,
- but it is probably a good idea to run the candidate training for a while
- after the output weights have been frozen. There may be a temporary,
- discontinuous up-tick in the network's error when you tenure a new hidden
- unit. If such a glitch is worrisome, you can minimize the damage by
- installing the new unit with a very small (or zero) output weight. This
- too will slow down training.
-
- Sorry, I don't have any citations to published work on this topic. All
- this is based on my own experience and what others have told me informally.
- If anyone else knows of such citations, I'd like to hear about them.
-
- -- Scott
-
- ===========================================================================
- Scott E. Fahlman Internet: sef+@cs.cmu.edu
- Senior Research Scientist Phone: 412 268-2575
- School of Computer Science Fax: 412 681-5739
- Carnegie Mellon University Latitude: 40:26:33 N
- 5000 Forbes Avenue Longitude: 79:56:48 W
- Pittsburgh, PA 15213
- ===========================================================================
-