home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: sci.research
- Path: sparky!uunet!mcsun!Germany.EU.net!unibwh.unibw-hamburg.de!unibwh!p_misiak
- From: p_misiak@unibwh.unibw-hamburg.de (Carlo Misiak)
- Subject: Re: Newsweek Article: is Science Censored?
- In-Reply-To: paj@uk.co.gec-mrc's message of 14 Sep 92 09:54:17 GMT
- Message-ID: <P_MISIAK.92Sep15100744@grafix.unibwh.unibw-hamburg.de>
- Sender: news@unibw-hamburg.de
- Organization: University of Federal Armed Forces Hamburg
- References: <1992Sep12.145210.694@cs.brown.edu> <1955@snap>
- Date: Tue, 15 Sep 1992 10:07:44 GMT
- Lines: 33
-
- In article <1955@snap> paj@uk.co.gec-mrc (Paul Johnson) writes:
-
- By the way, the reason that trawling through statistics looking for
- correlations is dangerous is as follows:
-
- A significant correlation is one greater than 95%. But if you look at
- random data in 20 different ways, you are going to find a significant
- correlation 1 - 0.95^20 = 0.64 of the time. In other words a study
- looking at random data 20 different ways has a greater than evens
- chance of finding something "significant".
-
- Disclaimer: I am not a statistician. I hope I got that right.
-
- It is actually worse than that (than your somewhat - fuzzy, at least -
- statement). If your N is large enough, you will get significant correlations
- in the range of .3 which gives an r^2 of an amazing 0.09 which infers that
- you can forget the correlation.
-
- Now if you have a large correlation matrix of say 100 by 100 elements and you are
- on the 5% level you will find 500 significant but discardable correlations by chance.
-
- Then imagine you have 1000 variables and you run through the procedure 20 times
- each time selecting another subset of 100 vars out of the original 1000.
-
- Call the results scientific results that *prove* something.
-
- Cheers
-
- --
- Carlo Misiak
-
- *** All that we C or Scheme is but a mind in the machine *** (remember POE) ***
-
-