home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!usc!sdd.hp.com!uakari.primate.wisc.edu!ames!lll-winken!tazdevil!henrik
- From: henrik@mpci.llnl.gov (Henrik Klagges)
- Newsgroups: comp.ai.neural-nets
- Subject: Re: Reducing Training time vs Generalisation
- Message-ID: <?.714340347@tazdevil>
- Date: 20 Aug 92 19:52:27 GMT
- References: <Bt9GIx.9In.1@cs.cmu.edu>
- Sender: usenet@lll-winken.LLNL.GOV
- Lines: 32
- Nntp-Posting-Host: tazdevil.llnl.gov
-
- sef@sef-pmax.slisp.cs.cmu.edu writes:
- >For example, in the example about the big
- >gaussian spike, it would drive the output weight to zero if the Gaussian is
- >not helping to fit the data.
-
- Key point. A few add-on 'reasonability criterions' like weight decay are
- quite effective in avoiding pathological results.
-
- >Well, since you keep pounding on this, I will point out that in most
- >backprop-style nets after training, almost all of the hidden units are
- >saturated almost all of the time. So you can replace them with sharp
-
- Same in our experiments. The decision trees being built do benefit a lot
- from the remaining nonlinearities, though (smoother decision surfaces-
- really 8-).
-
- >Myself, I prefer to think in terms of parallel hardware, so lazy evaluation
- >isn't an issue. Yes, sigmoid unit hardware is a bit more expensive to
- >implement than simple gates, but I don't need nearly as many of them.
-
- It is not terribly expensive - a 256 entry table is usually enough. Pipe
- lined access to such a lookup table can be made at one lookup/cycle at a
- pipe stall of less than 5 (if not much better, hihi). Moreover, weight
- accumulation/update are matrix operations, while lookup is only a vector
- operation. It is no bottleneck at all.
-
- Cheers,
- Henrik
-
-
- BM Research Division physics group Munich
- massively parallel group at Lawrence Livermore
-