home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!ogicse!cse.ogi.edu!stever
- From: stever@cse.ogi.edu (Steve Rehfuss)
- Newsgroups: comp.ai.neural-nets
- Subject: Re: Training classification with uneven categories
- Message-ID: <40358@ogicse.ogi.edu>
- Date: 21 Jul 92 17:24:51 GMT
- Article-I.D.: ogicse.40358
- References: <1992Jul9.060922.28633@iti.gov.sg>
- Sender: news@ogicse.ogi.edu
- Distribution: world
- Organization: Oregon Graduate Institute (formerly OGC), Beaverton, OR
- Lines: 38
-
- In article <1992Jul9.060922.28633@iti.gov.sg>, cheekit@iti.gov.sg (Looi Chee Kit) writes:
- |> We have been working on a neural network approach to bankruptcy prediction as
- |> a comparison with a statistical model constructed using probit analysis. A set
- |> of 6 financial ratios comprises the set of independent variables for the model
- |> (which is also the set of input variables for a 3-layer network using backprop
- |> for training). Information from matched samples (i.e. 165 non-bankrupct
- |> companies and 165 bankrupct companies) are use for fitting the probit model.
- |> The real life proportions of bankrupct and non-bankrupct companies are 0.006
- |> and 0.994. My question is: do we want to weigh the presentations of data for
- |> bankrupct and non-bankrupct companies to reflect in some way the real-life
- |> proportions (or the relative misclassification costs) when training the
- |> neural network? When we trained our neural network using matched samples,
- |> it was good for predicting bankruptcy cases while it was not perfect in
- |> predicting the non-bankruptcy cases, resulting in an overall accuracy rate
- |> (weighted) of less than 0.994 (0.994 is what we will get if we classify ANY
- |> data as non-bankrupct). This issue seems to be relevant for applications
- |> of neural networks to classification tasks where the real life proportion is
- |> unevenly matched between the categories.
- |>
- |> Any work done, ideas & suggestions, please reply via email to:
- |> cheekit@iti.gov.sg
- |>
- |> ---
- |> Chee-Kit LOOI | Internet: cheekit@iti.gov.sg
- |> Knowledge Systems Lab | Bitnet: cheekit@itivax
- |> Information Technology Institute | Tel: (65) 772-0926
- |> National Computer Board of Singapore | Fax: (65) 770-3043
-
- Look at Bourlard & Morgan, A Continuous Speech Recognition System Embedding MLP into HMM,
- in NIPS, 1988 or 1989. They divide the outputs by the prior class probabilities, then
- choosing the max output corresponds to choosing the class that makes the data most likely.
- Other people have done this too. If you have enough data, an adequate architecture, and
- train to MSE or cross-entropy (or various other error fns), and don't get stuck in a
- local minimum, the outputs converge to p(class|inputs). Dividing by prior class
- probabilities and doing winner-take-all gives you:
- p(class|inputs) p(inputs|class)
- argmax ----------------- = argmax --------------- = argmax p(inputs|class)
- c p(class) c p(inputs) c
-