home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!cs.utexas.edu!sun-barr!olivea!spool.mu.edu!yale.edu!yale!gumby!destroyer!cs.ubc.ca!alberta!arms
- From: arms@cs.UAlberta.CA (Bill Armstrong)
- Newsgroups: comp.ai.neural-nets
- Subject: Re: Questions about sigmoids etc.
- Keywords: Sigmoids, output layers
- Message-ID: <arms.724120505@spedden>
- Date: 12 Dec 92 00:35:05 GMT
- References: <waugh.723705045@probitas> <1992Dec8.161935@sees.bangor.ac.uk> <1992Dec9.160218.25286@cs.brown.edu> <1992Dec10.084458.12506@dxcern.cern.ch> <1992Dec10.123626.28838@cs.brown.edu>
- Sender: news@cs.UAlberta.CA (News Administrator)
- Organization: University of Alberta, Edmonton, Canada
- Lines: 42
- Nntp-Posting-Host: spedden.cs.ualberta.ca
-
- hm@cs.brown.edu (Harry Mamaysky) writes:
-
- >In article <1992Dec10.084458.12506@dxcern.cern.ch>, block@dxlaa.cern.ch (Frank Block) writes:
- >|>
- >|> In article <1992Dec9.160218.25286@cs.brown.edu>, pcm@cs.brown.edu (Peter C. McCluskey) writes:
- >|> |> In article <1992Dec8.161935@sees.bangor.ac.uk>, paulw@sees.bangor.ac.uk
- >|> |> (Mr P Williams (AD)) writes:
- >|> |> |> For backpropagation networks (i.e. Rumelhart ,Mclelland and Williams),
- >|> |> |> it is neccessary to have a monotonically increasing, DIFFERENTIABLE
- >|> |> |> function as the output
-
- >|>
- >|> But how are you going to train a network with non-differentiable functions?
- >|> Certainly not with the standard BP?
-
- A partial derivative is only one way of measuring the influence of a
- weight change on the output error of a network. You can also use
- ratios of non-infinitesimal perturbations of values too. Even though
- we speak of the derivative we can never take the limit anyway in
- implementations.
-
- The ALN adaptive algorithm (atree release 2.7 in pub/atre27.exe for
- Windows 3.x on menaik.cs.ualberta.ca [129.128.4.241]) deals with
- finite changes. Actually since we are then in a boolean tree, either
- the output changes when a "weight" changes, or it doesn't. The
- logical measure is much faster to evaluate in combinational hardware
- than the derivative, and it can be done by lazy evaluation of the
- logic in software.
-
- I think in large adaptive systems, we have to give up on
- differentiability, and even on the idea of only having functions. I
- think we need to learn relationships that are many to many. In the
- general case, I don't see any way we can use derivatives, however
- finite perturbations can still be used to reduce error. This error
- reduction combined with Widrow's idea of "least disturbance" provides
- lots of possible adaptive algorithms.
-
- --
- ***************************************************
- Prof. William W. Armstrong, Computing Science Dept.
- University of Alberta; Edmonton, Alberta, Canada T6G 2H1
- arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071
-