home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.ai.neural-nets
- Path: sparky!uunet!usc!sol.ctr.columbia.edu!destroyer!ubc-cs!unixg.ubc.ca!kakwa.ucs.ualberta.ca!alberta!arms
- From: arms@cs.UAlberta.CA (Bill Armstrong)
- Subject: Re: Reducing Training time vs Generalisation
- Message-ID: <arms.714289771@spedden>
- Sender: news@cs.UAlberta.CA (News Administrator)
- Nntp-Posting-Host: spedden.cs.ualberta.ca
- Organization: University of Alberta, Edmonton, Canada
- References: <Bt9GIx.9In.1@cs.cmu.edu>
- Date: Thu, 20 Aug 1992 05:49:31 GMT
- Lines: 157
-
- sef@sef-pmax.slisp.cs.cmu.edu writes:
-
- > >If you shape the training set a bit more carefully
- >
- > > * *
- > > .............* *.................
- >
- > >and use a two-hidden unit net, you can FORCE the solution with a big
- > >excursion. Only the "ankle" part of the sigmoid will fit these tossing
- > >points. However, a net with more hidden units could again create a
- > >plateau, however, and this would be the more likely solution.
- >
- > Same question: is this plateau a local minimum only?
-
- >Well, one ugly solution uses four hidden units and gets up to the plateau
- >in two steps, and down in two more. Again, zero error, so the solution is
- >one of the global minima.
-
- The plateau solution might not be a global minimum if we just specify
- the values along the ....... part to fit the sigmoids perfectly only
- if there is a peak.
-
- > The more hidden units and layers you have, the less transparent the
- > behavior of the whole net becomes. Some examples of "wild" behavior
- > will only appear with relatively small weights, but lots of layers:
- > linear regions of sigmoids all lining up to produce a big excursion.
- >
- >Sure, but they will have no incentive to do this unless the data, in some
- >sense, forces them to. You could always throw a narrow Gaussian unit into
- >the net, slide it over between any two training points, and give it an
- >output weight of 10^99. But it would be wrong.
-
- A good reason not to use such units, eh!
-
- > >1. Sacrifice perfect fit. Weight decay does this in a natural way, finding
- > >a compromise between exact fit on the training set and the desire for small
- > >weights (or low derivatives on the output surface). The weight-decay
- > >parameter controls the relative weight given to fit and smallness of
- > >weights.
- >
- > I agree this will work if you can get fine enough control that reaches
- > to every point of the input space to prevent excursions, and the
- > fitting of the function to the spec is good enough. Though I don't
- > deny it could be done, is this easy, or even possible to do in practice?
- >
- >You don't need fine control. If you just crank some weight decay into the
- >error measure, it will keep the net from making wild excursion without
- >being forced to by the data. For example, in the example about the big
- >gaussian spike, it would drive the output weight to zero if the Gaussian is
- >not helping to fit the data.
-
- I seem to recall going over this before, and I believe what is
- required to upset the scheme is to have a lot of training points which
- force training to fit a solution having a peak. I.e. if six points
- aren't enough, take as many as are required to overwhelm the weight
- decay. I suppose you can make the weight decay greater as the number
- of training points increases to make it so the weight decay can't be
- overwhelmed. But I think you would lose some genuine peaks which
- weren't represented by their fair share of points in the training set.
- Isn't this true: unless your training samples are well distributed,
- you tend to wipe out features of less well sampled parts of the space?
- You might say this is what you want. A little control might be desirable.
-
- > >2. Sacrifice smoothness. If sharp corners are OK, it is a trivial matter
- > >to add an extra piece of machinery that simply enforces the non-excursion
- > >criterion, clipping the neural net's output when it wanders outside the
- > >region bounded by the training set outputs.
- >
- > If you do want some excursions and you don't want others, this won't
- > work. It is not a simple matter to find the problem regions and bound
- > them specially. I believe it is NP complete to find them. The
- > plausibility of this can be argued as follows: ALNs are a special case
- > of MLPs; you can't tell if an ALN is constant 0 or has some 1 output
- > somewhere (worst case, CNF- satisfiability); this is a special case of
- > a spike that may not be wanted.
- >
- >Huh??? It's trivial to put a widget on each output line that clips the
- >output to lie between all those seen during training.
-
- Sure.
-
- It's a bit harder,
- >but still not NP complete, to find the convex hull of outputs and clip to
- >that.
-
-
- Fine. Unfortunately in both cases above, there can still be excursions in
- the convex hull that you don't want and can't prevent this way.
-
- Now, if you want some excursions and not others, you'd better tell
- >the net or the net designer about this -- it's a bit unreasonable to expect
- >a backrpop net to read your mind.
-
- OK -- this is close to what I meant by "fine control".
-
-
- >
- > The usual MLPs are very "globally" oriented, so they may be good at
- > capturing the global information inherent in training data. The
- > downside is that you can't evaluate the output just taking local
- > information into account...
- > (Does anyone hear soft funeral music?)
- >
- >Well, since you keep pounding on this, I will point out that in most
- >backprop-style nets after training, almost all of the hidden units are
- >saturated almost all of the time. So you can replace them with sharp
- >thresholds and use the same kind of lazy evaluation at runtime that you
- >propose for ALNs: work backwards through the tree and don't evaluate any
- >input sub-trees that can't alter the current unit's state.
-
- You have a lot more experience than I do with sigmoid type nets, so
- what you have just said is extremely significant, in that you are
- coming closer all the time to a logical net. If you are able to
- replace sigmoids with sharp thresholds, and not change the output of
- the net significantly, then you are really using threshold *logic*
- nets. Now let's see what it takes to get lazy evaluation: first of
- all, I think you would have to insist that the sign of all weights on
- an element be positive, and all signals in the net too. Otherwise in
- forming a weighted sum of inputs, you can not be sure you are on one
- side of the sharp threshold or not until you have evaluated all inputs
- (not lazy!). I think the signals would have to be bounded too.
- I think this would be OK. ALNs are still faster, because they don't
- do arithmetic, but ALNs don't have as powerful nodes.
-
- One argument for going whole hog into ALNs is that you don't have to
- train using sigmoids, then risk damaging the result of learning by
- going to sharp thresholds. If there were a training procedure for
- networks of the above kind of node with a sharp threshold, that would
- be very promising. I thought backprop required differentiability to
- work though.
-
-
- >Myself, I prefer to think in terms of parallel hardware, so lazy evaluation
- >isn't an issue.
-
- Not true! If you have a fixed amount of hardware, then to do large
- problems, you will have to iterate it. Lazy evaluation allows one to
- suppress iterations because you don't need certain inputs, so laziness
- is still very useful. The speedup factor compared to complete evaluation
- grows with the size of the problem.
-
- Yes, sigmoid unit hardware is a bit more expensive to
- >implement than simple gates, but I don't need nearly as many of them.
-
- Hardly seems worth while to keep sigmoids if almost all of your units
- are almost always saturated though. ALNs may have to train with lots
- of nodes, but after training, we collapse entire subtrees of adaptive
- nodes into just a single transistor for execution.
-
- Thanks.
-
- Bill
- --
- ***************************************************
- Prof. William W. Armstrong, Computing Science Dept.
- University of Alberta; Edmonton, Alberta, Canada T6G 2H1
- arms@cs.ualberta.ca Tel(403)492 2374 FAX 492 1071
-