home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!darwin.sura.net!paladin.american.edu!auvm!CCDOSH1.EM.CDC.GOV!DXA6
- Encoding: 71 TEXT
- X-Mailer: Microsoft Mail V3.0
- Message-ID: <2B4C2012@router.em.cdc.gov>
- Newsgroups: bit.listserv.stat-l
- Date: Thu, 7 Jan 1993 16:09:00 EST
- Sender: STATISTICAL CONSULTING <STAT-L@MCGILL1.BITNET>
- From: dxa6@CCDOSH1.EM.CDC.GOV
- Subject: Spreadsheets and calculators, again
- Lines: 69
-
- *Forgive me if this is a duplicate posting, but I have reason to believe my
- earlier effort failed to transmit.*
-
- Here I go again, sticking my neck out. Let me begin with a disclaimer: I
- agree with most of what has been said about the perils of using
- spreadsheets for statistical calculations. There are certainly many
- pitfalls.
-
- When I confessed that I had used spreadsheets successfully under some
- circumstances, I had no doubt that my data was "well-behaved," and with
- small sample sizes, it was easy to check. I used a spreadsheet because it
- was the best tool I had available at the moment, and I needed a quick
- answer. I was also careful to set up the calculations to include all the
- intermediate values, just as if I was doing it by hand, in order to have at
- least some check for overflows and other errors. The purists on the list
- think that what I did was "wrong," or at least too risky to tolerate.
-
- A number of people provided me with private responses containing examples
- where spreadsheets are likely to break down doing statstical computations.
- One ofthe most frequently cited was the formula for variance. A couple of
- people pointed out that the usual computational formula,
- 1/n * (sum(x^2) - sum(x)^2)/n
- required the accumulation of the sum of the squares term, which could easily
- overflow or lose precision, and that a better (though not perfect) formula
- would be the general,
- sum((x-mean(x))^2/n
- which by subtracting the mean before taking the square, was less likely to
- overflow. This in turn lead me to wonder about the differences between how
- my old pocket calculator handled such calculations vs. a more modern PC,
- since the calculator used the usual computational formula.
-
- As several people pointed out, calculators overflow with factorials greater
- than 69! (Phil Miller claimed the same limit for an IBM PC, model
- unspecified.) I clearly recall that the same was true of my old HP 67, but it
- has long since gone to the hardware graveyard. (My current HP 11C also
- overflows at 70!.) I did save my old HP 67 and even older HP 55 manuals,
- because they contained a number of algorithms for statistical computing
- on pocket calculators, designed to circumvent some of the pitfalls that we
- are all worried about.
-
- On my Macintosh IIci at home (Motorola 68030 with MC68883 floating-point math
- coprocessor) using a Microsoft Excel 3.0 spreadheet, I tried factorials up to
- 100! with no apparent problems:
- 100! = 9.332622E+157 and 99!=9.332622E+155, while 100!/99! = 100.000000
- (yes, I divided the output from the previous individual calculations), so it
- appears to be correctly handling much larger numbers than a pocket
- calculator, or an older IBM PC can. (Duncan Murdoch today said that some PCs
- may be able to handle numbers up to 170! or 1754!.)
-
- The built in algorithm for variance (of a population sample) in Excel 3.0 is:
- (n (sum(x^2)) - (sum(x))^2) / (n(n-1))
- (There is also a VARP function which calculates variance for an entire
- population, but the formula is not explicitly given, so one can only assume
- it uses n and not n-1 in the denomonator.) As a couple of people noted, this
- formula is not the best. But the overflow problem may be mitigated somewhat
- by the capabilities of the CPU and coprocessor.
-
- None of this changes the rule of thumb that spreadsheets for statistics are
- risky. My only point is that for every rule, there are exceptions. ;-)
- What I, and probably many others in the world, would like to know are some
- more explicit guidelines on the matter; and as I said, I intend to explore
- this area some more on my own.
-
- --Dave
- _____________________________________________________________________________
- David Arday, M.D., M.P.H
- Office on Smoking and Heath
- Centers for Disease Control and Prevention
- dxa6@ccdosh1.em.cdc.gov
-