home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!usc!rpi!usenet.coe.montana.edu!news.u.washington.edu!ogicse!das-news.harvard.edu!cantaloupe.srv.cs.cmu.edu!crabapple.srv.cs.cmu.edu!news
- From: sef@sef1.slisp.cs.cmu.edu
- Newsgroups: comp.ai.neural-nets
- Subject: Re: Summary of CUPS + new question
- Message-ID: <BuCFut.F6t.1@cs.cmu.edu>
- Date: 10 Sep 92 03:47:14 GMT
- Article-I.D.: cs.BuCFut.F6t.1
- Sender: news@cs.cmu.edu (Usenet News System)
- Organization: School of Computer Science, Carnegie Mellon
- Lines: 71
- Nntp-Posting-Host: sef1.slisp.cs.cmu.edu
-
-
- From: jjb@sequent.com (Jeff Berkowitz)
-
- Some weeks back I posted a request for real examples of the performance
- of back propogation simulators in "backprops/second." Bill Armstrong
- at the University of Alberta, Canada pointed out in a posting that the
- accepted unit of measurement was CUPS...
-
- Actually, it's a bit more complicated than this. The cleanest form of the
- backprop algorithm (called "batch" or "per-epoch" updating) does
- forward/backprop cycles through the whole training set, accumulating dE/dw
- for each weight. Then all the weights are updated at once. This gives you
- the "true" error gradient, but can be slow if the training set is large and
- redundant.
-
- An alternative form (unfortunately, the famous Rumelhart-Hinton-Williams
- paper tends to muddle these two forms together) is called "stochastic",
- "online", or "per-example" updating. In this case you update the weights
- by a small amount as each training example goes by. For very small step
- sizes, this amounts to the same thing; if your steps get too large, it can
- lead to trouble, since several atypical samples in a row can knock you far
- off course.
-
- People generally speak of CUPS (connection UPDATES per second) only in
- connection with the second kind of algorithm. If you're doing batch
- training, a more appropriate measure is CPS (connections per second), since
- the update step is outside the inner loop. Things are further complicated
- by the fact that some people report CPS for the forward pass only, and
- others count the time required by both the forward and backward passes.
- And more complicates still is the question of what network you are talking
- about. As the fan-in of units and the number of training examples
- increase, the CPS numbers go up, since the network is spending more time in
- its innermost loops. So you sometimes see reports of "asymptotic CPS": the
- speed achieved in the limit of large fan-in and many training cases per
- epoch.
-
- In batch training, counting both the forward and backward passes, the
- innermost loop is essentially a dot product in the forward direction and a
- dot product with added bookkeeping in the backward direction. If
- everything is perfectly balanced, that's about 3-4 multiply/accumulate
- instructions (or 6-8 FLOPS) per connection-crossing. So a DSP chip rated
- at 20 MFLOPS can hope to approach an asymptote of about 2.5 to 3.3 MCPS.
- (Some hardware backprop implementations may use integers rather than
- floating point, but that's another story.) A lot of the accelerator boards
- beign sold as "neurocomputers" max out in the 1-3 MCPS range. A hot
- workstation will do about as well.
-
- One famous neural net implementation was done on the ten-processor Warp
- machine at CMU. This machine was rated at 100 MFLOPS, and gave an
- asymptoptic backprop performance of about 17-20 MCPS, depending on how you
- count. Some large SIMD machines should be capable of 1 GCPS or so, but
- there's a problem keeping these machines fed with sufficient data.
-
- To answer your more recent question: If you're doing per-epoch update, the
- weights are not changed until after all the forwad-backward passes are
- completed. If you are doing per-example updating, the back-propagation and
- weight updating can be interspersed. It's probably more correct to update
- a weight AFTER the error has propagated backwards across it, but if your
- weight-update steps are large enough for this to matter, you're going to
- get in trouble anyway due to stochastic fluctuations in the data.
-
- -- Scott
-
- ===========================================================================
- Scott E. Fahlman
- School of Computer Science
- Carnegie Mellon University
- 5000 Forbes Avenue
- Pittsburgh, PA 15213
-
- Internet: sef+@cs.cmu.edu
-