home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!mcsun!uknet!cam-eng!ajr
- From: ajr@eng.cam.ac.uk (Tony Robinson)
- Newsgroups: comp.ai.neural-nets
- Subject: Re: Summary of CUPS + new question
- Message-ID: <1992Sep10.090501.26265@eng.cam.ac.uk>
- Date: 10 Sep 92 09:05:01 GMT
- References: <BuCFut.F6t.1@cs.cmu.edu>
- Sender: ajr@eng.cam.ac.uk (Tony Robinson)
- Organization: Cambridge University Engineering Department, UK
- Lines: 24
- Nntp-Posting-Host: dsl.eng.cam.ac.uk
-
- In comp.ai.neural-nets Scott Fahlman <sef+@cs.cmu.edu> wrote:
-
- > It's probably more correct to update
- > a weight AFTER the error has propagated backwards across it, but if your
- > weight-update steps are large enough for this to matter, you're going to
- > get in trouble anyway due to stochastic fluctuations in the data.
-
- Agreed that this should not matter much either way, but I think the original
- poster, Jeff Berkowitz <jjb@sequent.com> has a point.
-
- Doing the calculation of "deltas" first, then weight updates, gives the true
- per-pattern gradient based on the original weights. However, if you
- calculate the deltas for the next layer downwards based on the new weights
- higher up, perhaps you work a little in the new weight space. I think that
- this could result in a small speed up (I'd have to think a bit more to make
- sure), but to do the process properly, you should calculate all the new
- activations higher up, then recalculate new deltas for use with the new
- weights. This is O(n^2) in the number of layers, so isn't worthwhile.
-
- I'm not a great fan of per-pattern update anyway (prefering updates on
- something like the square root of the number of training patterns), but I
- think this is an interesting idea.
-
- Tony [Robinson]
-