NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / bit / listserv / statl / 2335 < prev next >

Wrap

Text File | 1993-01-08 | 4.6 KB | 97 lines

Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU Path: sparky!uunet!spool.mu.edu!wupost!howland.reston.ans.net!paladin.american.edu!auvm!CBIS.ECE.DREXEL.EDU!CHARLES.HAAS Message-ID: <9301081401.AA01015@cbis.ECE.Drexel.EDU> Newsgroups: bit.listserv.stat-l Date: Fri, 8 Jan 1993 09:06:51 -1000 Sender: STATISTICAL CONSULTING <STAT-L@MCGILL1.BITNET> Comments: Warning -- original Sender: tag was chaas@CBIS.ECE.DREXEL.EDU From: Charles Haas <Charles.Haas@CBIS.ECE.DREXEL.EDU> Subject: Roundoff of various formulae in spreadsheets Lines: 85 There has been some discussion of late on the relative numerical precision of various algorithms, including a discussion of the "unreliability" of built-in spreadsheet functions at computing routine statistical tests. I decided to perform a simple numerical experiment, using Excel version 3.0 for the Macintosh (on a IIci). Consider the data set comprised of three points: (x, x+e, x-e). I set x to 1E20 and looked at the standard deviation (using the builtin STDEV function) versus various values of e. I also used the "mean shift" method of computing the standard deviation, i.e. s = sqrt((sum(xi-xmean)^2)/(N-1)) The exact result should yield a standard deviation equal to "e". The following is the result: e excel stdev stdev(mean shift) 1.00E+13 9.9565E+12 1E+13 5E+12 4.9172E+12 (2% err) 5E+12 2E+12 1.5549E+12 (22 % err) 2E+12 1E+12 0 1E+12 5E+11 0 5E+11 2E+11 0 2E+11 1E+11 0 1E+11 5E+10 0 5E+10 2E+10 0 2E+10 1E+10 0 1E+10 5000000000 0 5000003584 2000000000 0 1999994880 1000000000 0 999997440 500000000 0 500006912 200000000 0 199999488 100000000 0 100007936 50000000 0 50003968 20000000 0 20004864 10000000 0 9994240 5000000 0 4997120 2000000 0 1998848 1000000 0 999424 500000 0 507904 (1 % err) 200000 0 196608 (2 % err) 100000 0 98304 50000 0 49152 20000 0 16384 (15 % err) 10000 0 16384 (60 % err) 5000 0 0 Conclusions: 1) The built-in Excel function should be suspect if the interquartile range (a robust scale estimator) is very small (particularly less than one part in one million) relative to the mean. 2) The mean shift method (naive computation) is more reliable, down to about one part in 1E15. Questions: The computation is run on a machine with a FPU. I wonder whether or not there might be a difference on a machine without an FPU. Also, is the accuracy on other platforms (DOS) the same? Comment: The computing formula for the standard deviation is often justified (from the days of slide rules and machines in which memory limitations were severe, e.g. calculators) as simpler (being "one pass") compared to the defining equation for standard deviation. However the above shows a clear and substantial (at least in applications at which the deviation is below the ppm relative range) difference in the methods. {=== Perhaps I have just replowed old ground? =================} /-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/ -/-/-/ / Charles N. Haas / / Betz Professor of Environmental Engineering / / Drexel University / / Philadelphia, PA 19104 / / / / Internet: CHAAS@CoE.Drexel.Edu (preferred) / / Bitnet: HAASCN@DUVM.BITNET / /-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/-/