NetNews Usenet Archive 1992 #20

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #20 / NN_1992_20.iso / spool / comp / ai / neuraln / 3434 < prev next >

Wrap

Internet Message Format | 1992-09-09 | 4.1 KB

Path: sparky!uunet!usc!rpi!usenet.coe.montana.edu!news.u.washington.edu!ogicse!das-news.harvard.edu!cantaloupe.srv.cs.cmu.edu!crabapple.srv.cs.cmu.edu!news From: sef@sef1.slisp.cs.cmu.edu Newsgroups: comp.ai.neural-nets Subject: Re: Summary of CUPS + new question Message-ID: <BuCFut.F6t.1@cs.cmu.edu> Date: 10 Sep 92 03:47:14 GMT Article-I.D.: cs.BuCFut.F6t.1 Sender: news@cs.cmu.edu (Usenet News System) Organization: School of Computer Science, Carnegie Mellon Lines: 71 Nntp-Posting-Host: sef1.slisp.cs.cmu.edu From: jjb@sequent.com (Jeff Berkowitz) Some weeks back I posted a request for real examples of the performance of back propogation simulators in "backprops/second." Bill Armstrong at the University of Alberta, Canada pointed out in a posting that the accepted unit of measurement was CUPS... Actually, it's a bit more complicated than this. The cleanest form of the backprop algorithm (called "batch" or "per-epoch" updating) does forward/backprop cycles through the whole training set, accumulating dE/dw for each weight. Then all the weights are updated at once. This gives you the "true" error gradient, but can be slow if the training set is large and redundant. An alternative form (unfortunately, the famous Rumelhart-Hinton-Williams paper tends to muddle these two forms together) is called "stochastic", "online", or "per-example" updating. In this case you update the weights by a small amount as each training example goes by. For very small step sizes, this amounts to the same thing; if your steps get too large, it can lead to trouble, since several atypical samples in a row can knock you far off course. People generally speak of CUPS (connection UPDATES per second) only in connection with the second kind of algorithm. If you're doing batch training, a more appropriate measure is CPS (connections per second), since the update step is outside the inner loop. Things are further complicated by the fact that some people report CPS for the forward pass only, and others count the time required by both the forward and backward passes. And more complicates still is the question of what network you are talking about. As the fan-in of units and the number of training examples increase, the CPS numbers go up, since the network is spending more time in its innermost loops. So you sometimes see reports of "asymptotic CPS": the speed achieved in the limit of large fan-in and many training cases per epoch. In batch training, counting both the forward and backward passes, the innermost loop is essentially a dot product in the forward direction and a dot product with added bookkeeping in the backward direction. If everything is perfectly balanced, that's about 3-4 multiply/accumulate instructions (or 6-8 FLOPS) per connection-crossing. So a DSP chip rated at 20 MFLOPS can hope to approach an asymptote of about 2.5 to 3.3 MCPS. (Some hardware backprop implementations may use integers rather than floating point, but that's another story.) A lot of the accelerator boards beign sold as "neurocomputers" max out in the 1-3 MCPS range. A hot workstation will do about as well. One famous neural net implementation was done on the ten-processor Warp machine at CMU. This machine was rated at 100 MFLOPS, and gave an asymptoptic backprop performance of about 17-20 MCPS, depending on how you count. Some large SIMD machines should be capable of 1 GCPS or so, but there's a problem keeping these machines fed with sufficient data. To answer your more recent question: If you're doing per-epoch update, the weights are not changed until after all the forwad-backward passes are completed. If you are doing per-example updating, the back-propagation and weight updating can be interspersed. It's probably more correct to update a weight AFTER the error has propagated backwards across it, but if your weight-update steps are large enough for this to matter, you're going to get in trouble anyway due to stochastic fluctuations in the data. -- Scott =========================================================================== Scott E. Fahlman School of Computer Science Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Internet: sef+@cs.cmu.edu