NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / ai / neuraln / 2855 < prev next >

Wrap

Internet Message Format | 1992-07-21 | 2.9 KB

Path: sparky!uunet!ogicse!cse.ogi.edu!stever From: stever@cse.ogi.edu (Steve Rehfuss) Newsgroups: comp.ai.neural-nets Subject: Re: Training classification with uneven categories Message-ID: <40358@ogicse.ogi.edu> Date: 21 Jul 92 17:24:51 GMT Article-I.D.: ogicse.40358 References: <1992Jul9.060922.28633@iti.gov.sg> Sender: news@ogicse.ogi.edu Distribution: world Organization: Oregon Graduate Institute (formerly OGC), Beaverton, OR Lines: 38 In article <1992Jul9.060922.28633@iti.gov.sg>, cheekit@iti.gov.sg (Looi Chee Kit) writes: |> We have been working on a neural network approach to bankruptcy prediction as |> a comparison with a statistical model constructed using probit analysis. A set |> of 6 financial ratios comprises the set of independent variables for the model |> (which is also the set of input variables for a 3-layer network using backprop |> for training). Information from matched samples (i.e. 165 non-bankrupct |> companies and 165 bankrupct companies) are use for fitting the probit model. |> The real life proportions of bankrupct and non-bankrupct companies are 0.006 |> and 0.994. My question is: do we want to weigh the presentations of data for |> bankrupct and non-bankrupct companies to reflect in some way the real-life |> proportions (or the relative misclassification costs) when training the |> neural network? When we trained our neural network using matched samples, |> it was good for predicting bankruptcy cases while it was not perfect in |> predicting the non-bankruptcy cases, resulting in an overall accuracy rate |> (weighted) of less than 0.994 (0.994 is what we will get if we classify ANY |> data as non-bankrupct). This issue seems to be relevant for applications |> of neural networks to classification tasks where the real life proportion is |> unevenly matched between the categories. |> |> Any work done, ideas & suggestions, please reply via email to: |> cheekit@iti.gov.sg |> |> --- |> Chee-Kit LOOI | Internet: cheekit@iti.gov.sg |> Knowledge Systems Lab | Bitnet: cheekit@itivax |> Information Technology Institute | Tel: (65) 772-0926 |> National Computer Board of Singapore | Fax: (65) 770-3043 Look at Bourlard & Morgan, A Continuous Speech Recognition System Embedding MLP into HMM, in NIPS, 1988 or 1989. They divide the outputs by the prior class probabilities, then choosing the max output corresponds to choosing the class that makes the data most likely. Other people have done this too. If you have enough data, an adequate architecture, and train to MSE or cross-entropy (or various other error fns), and don't get stuck in a local minimum, the outputs converge to p(class|inputs). Dividing by prior class probabilities and doing winner-take-all gives you: p(class|inputs) p(inputs|class) argmax ----------------- = argmax --------------- = argmax p(inputs|class) c p(class) c p(inputs) c