NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / ai / neuraln / 3207 < prev next >

Wrap

Text File | 1992-08-16 | 4.0 KB | 92 lines

Newsgroups: comp.ai.neural-nets Path: sparky!uunet!cs.utexas.edu!wupost!gumby!destroyer!ubc-cs!unixg.ubc.ca!kakwa.ucs.ualberta.ca!alberta!arms From: arms@cs.UAlberta.CA (Bill Armstrong) Subject: Re: Reducing Training time vs Generalisation Message-ID: <arms.714014919@spedden> Keywords: back propagation, training, generalisation Sender: news@cs.UAlberta.CA (News Administrator) Nntp-Posting-Host: spedden.cs.ualberta.ca Organization: University of Alberta, Edmonton, Canada References: <1992Aug16.063825.15300@julian.uwo.ca> <1992Aug16.213939.15944@ccu1.aukuni.ac.nz> Date: Mon, 17 Aug 1992 01:28:39 GMT Lines: 78 edwin@ccu1.aukuni.ac.nz (Edwin Ng) writes: Luke Koop's summary etc. deleted. >Thanks for the summary Luke. I'd like to ask if anyone has >anything to add about the quality of generalisation >resulting from using different parameters to speed up >training?? ... >I ended up using a learning rate of 0.001 which amounted >to very tedious training in order to good generalisation. >Does anyone have any advice on how I can speed up training >without losing generalisation? Or is this a tradeoff >that can't be changed (some kind of conservation law) ? >I have tried using Scott Falman's Cascade Correlation but >the generalisation was much worse than backprop although >it learnt very quickly. Before one gets deeply into such questions, I think one should specify what one means by "generalization". This shouldn't degenerate to "I tried this method on this data set, and it didn't work so well". Rather it should mean something like: for a test point at distance d from a correctly learned training point, the response was still correct (This definition works for boolean and multi-class problems). You could generalize this using interpolation and continuous functions. Once this is agreed upon, some questions start to make sense, like: why should a multilayer perceptron generalize at all?" It's not because of continuity, because continuous functions can oscillate as rapidly as you want, and could fit any finite, non-contradictory training set as well as you want. If you could get a Lipschitz bound on the result of NN training, then you could be confident about getting some reasonable generalization: i.e. if x and x' are two input vectors, then the outputs y and y' would satisfy |y - y'| <= C |x - x'|. This is *much* stronger than continuity. Determining areasonably small C might be a problem for a given net. Other criteria of good generalization might include monotonicity of the synthesized function. In the case of adaptive logic networks, generalization is based on the fact that perturbations of an input vector tend to get filtered out as the signal is processed in a tree: a perturbation arriving at an AND-gate, for example, only has a 50% chance of getting through. Namely if the other input is a 0, the perturbation is cut off. This is a very simple idea, and it works, as is illustrated in the atree release 2.7 software by an OCR demo that experts have said is "quite impressive". It is something like a Lipschitz condition, but not on |y - y'|, but rather on the probability that y != y'. The most impressive generalization that has been attained with ALNs was work by Dekang Lin, where he used tree growth and a training set of about 10^5 points, and obtained 91.1% generalization to a space of 2^512 points (the multiplexor problem). The sparsity of the training set in that space boggles the mind. Growing a net while preserving good generalization is very difficult and time consuming to do, with a lot of reasoning about why adding some structure at a particular place will promote generalization. Generally, the one structure has to improve the response to many training points. It would be interesting to hear Scott Fahlmann's ideas on how to get good generalization. Then you might find out why you had problems using Cascade Correlation. -- *************************************************** Prof. William W. Armstrong, Computing Science Dept. University of Alberta; Edmonton, Alberta, Canada T6G 2H1 arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071