home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.ai.neural-nets
- Path: sparky!uunet!cs.utexas.edu!wupost!gumby!destroyer!ubc-cs!unixg.ubc.ca!kakwa.ucs.ualberta.ca!alberta!arms
- From: arms@cs.UAlberta.CA (Bill Armstrong)
- Subject: Re: Reducing Training time vs Generalisation
- Message-ID: <arms.714014919@spedden>
- Keywords: back propagation, training, generalisation
- Sender: news@cs.UAlberta.CA (News Administrator)
- Nntp-Posting-Host: spedden.cs.ualberta.ca
- Organization: University of Alberta, Edmonton, Canada
- References: <1992Aug16.063825.15300@julian.uwo.ca> <1992Aug16.213939.15944@ccu1.aukuni.ac.nz>
- Date: Mon, 17 Aug 1992 01:28:39 GMT
- Lines: 78
-
- edwin@ccu1.aukuni.ac.nz (Edwin Ng) writes:
-
-
- Luke Koop's summary etc. deleted.
-
- >Thanks for the summary Luke. I'd like to ask if anyone has
- >anything to add about the quality of generalisation
- >resulting from using different parameters to speed up
- >training??
-
- ...
-
- >I ended up using a learning rate of 0.001 which amounted
- >to very tedious training in order to good generalisation.
-
- >Does anyone have any advice on how I can speed up training
- >without losing generalisation? Or is this a tradeoff
- >that can't be changed (some kind of conservation law) ?
-
- >I have tried using Scott Falman's Cascade Correlation but
- >the generalisation was much worse than backprop although
- >it learnt very quickly.
-
- Before one gets deeply into such questions, I think one should specify
- what one means by "generalization". This shouldn't degenerate to "I
- tried this method on this data set, and it didn't work so well".
- Rather it should mean something like: for a test point at distance d
- from a correctly learned training point, the response was still
- correct (This definition works for boolean and multi-class problems).
- You could generalize this using interpolation and continuous
- functions.
-
- Once this is agreed upon, some questions start to make sense, like:
- why should a multilayer perceptron generalize at all?" It's not
- because of continuity, because continuous functions can oscillate as
- rapidly as you want, and could fit any finite, non-contradictory
- training set as well as you want.
-
- If you could get a Lipschitz bound on the result of NN training, then
- you could be confident about getting some reasonable generalization:
- i.e. if x and x' are two input vectors, then the outputs y and y'
- would satisfy |y - y'| <= C |x - x'|. This is *much* stronger than
- continuity. Determining areasonably small C might be a problem for a
- given net.
-
- Other criteria of good generalization might include monotonicity of the
- synthesized function.
-
- In the case of adaptive logic networks, generalization is based on the
- fact that perturbations of an input vector tend to get filtered out as
- the signal is processed in a tree: a perturbation arriving at an
- AND-gate, for example, only has a 50% chance of getting through.
- Namely if the other input is a 0, the perturbation is cut off. This
- is a very simple idea, and it works, as is illustrated in the atree
- release 2.7 software by an OCR demo that experts have said is "quite
- impressive". It is something like a Lipschitz condition, but not on
- |y - y'|, but rather on the probability that y != y'.
-
- The most impressive generalization that has been attained with ALNs
- was work by Dekang Lin, where he used tree growth and a training set
- of about 10^5 points, and obtained 91.1% generalization to a space of
- 2^512 points (the multiplexor problem). The sparsity of the training
- set in that space boggles the mind.
-
- Growing a net while preserving good generalization is very difficult
- and time consuming to do, with a lot of reasoning about why adding
- some structure at a particular place will promote generalization.
- Generally, the one structure has to improve the response to many
- training points.
-
- It would be interesting to hear Scott Fahlmann's ideas on how to get
- good generalization. Then you might find out why you had problems
- using Cascade Correlation.
- --
- ***************************************************
- Prof. William W. Armstrong, Computing Science Dept.
- University of Alberta; Edmonton, Alberta, Canada T6G 2H1
- arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071
-