home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!spool.mu.edu!wupost!howland.reston.ans.net!paladin.american.edu!auvm!CBIS.ECE.DREXEL.EDU!CHARLES.HAAS
- Message-ID: <9301081401.AA01015@cbis.ECE.Drexel.EDU>
- Newsgroups: bit.listserv.stat-l
- Date: Fri, 8 Jan 1993 09:06:51 -1000
- Sender: STATISTICAL CONSULTING <STAT-L@MCGILL1.BITNET>
- Comments: Warning -- original Sender: tag was chaas@CBIS.ECE.DREXEL.EDU
- From: Charles Haas <Charles.Haas@CBIS.ECE.DREXEL.EDU>
- Subject: Roundoff of various formulae in spreadsheets
- Lines: 85
-
- There has been some discussion of late on the relative numerical precision
- of various algorithms, including a discussion of the "unreliability" of
- built-in spreadsheet functions at computing routine statistical tests. I
- decided to perform a simple numerical experiment, using Excel version 3.0
- for the Macintosh (on a IIci).
-
- Consider the data set comprised of three points: (x, x+e, x-e). I set x to
- 1E20 and looked at the standard deviation (using the builtin STDEV
- function) versus various values of e. I also used the "mean shift" method
- of computing the standard deviation, i.e.
-
- s = sqrt((sum(xi-xmean)^2)/(N-1))
-
- The exact result should yield a standard deviation equal to "e".
-
- The following is the result:
-
- e excel stdev stdev(mean shift)
- 1.00E+13 9.9565E+12 1E+13
- 5E+12 4.9172E+12 (2% err) 5E+12
- 2E+12 1.5549E+12 (22 % err) 2E+12
- 1E+12 0 1E+12
- 5E+11 0 5E+11
- 2E+11 0 2E+11
- 1E+11 0 1E+11
- 5E+10 0 5E+10
- 2E+10 0 2E+10
- 1E+10 0 1E+10
- 5000000000 0 5000003584
- 2000000000 0 1999994880
- 1000000000 0 999997440
- 500000000 0 500006912
- 200000000 0 199999488
- 100000000 0 100007936
- 50000000 0 50003968
- 20000000 0 20004864
- 10000000 0 9994240
- 5000000 0 4997120
- 2000000 0 1998848
- 1000000 0 999424
- 500000 0 507904 (1 % err)
- 200000 0 196608 (2 % err)
- 100000 0 98304
- 50000 0 49152
- 20000 0 16384 (15 % err)
- 10000 0 16384 (60 % err)
- 5000 0 0
-
- Conclusions:
-
- 1) The built-in Excel function should be suspect if the interquartile range
- (a robust scale estimator) is very small (particularly less than one part
- in one million) relative to the mean.
-
- 2) The mean shift method (naive computation) is more reliable, down to
- about one part in 1E15.
-
- Questions:
-
- The computation is run on a machine with a FPU. I wonder whether or not
- there might be a difference on a machine without an FPU. Also, is the
- accuracy on other platforms (DOS) the same?
-
- Comment:
-
- The computing formula for the standard deviation is often justified (from
- the days of slide rules and machines in which memory limitations were
- severe, e.g. calculators) as simpler (being "one pass") compared to the
- defining equation for standard deviation. However the above shows a clear
- and substantial (at least in applications at which the deviation is below
- the ppm relative range) difference in the methods.
-
- {=== Perhaps I have just replowed old ground? =================}
-
- /-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/ -/-/-/
- / Charles N. Haas /
- / Betz Professor of Environmental Engineering /
- / Drexel University /
- / Philadelphia, PA 19104 /
-
- /
- /
- / Internet: CHAAS@CoE.Drexel.Edu (preferred) /
- / Bitnet: HAASCN@DUVM.BITNET /
- /-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/
-