The World of Computer Software

home *** CD-ROM | disk | FTP | other *** search

/ The World of Computer Software / World_Of_Computer_Software-02-385-Vol-1of3.iso / t / ts5st13.zip / STATLADR.INF < prev next >

Wrap

Text File | 1992-09-22 | 13KB | 290 lines

Tue 22-September-1992 (All rights reserved) About TS5ST in General (Least absolute deviation multiple regression) ====================== Contents: 1. Introduction 2. General description of statladr 3. Standard errors and goodness of fit statistics 4. Release notes 5. List of the files in the package 1. INTRODUCTION Apply question mark ? with the program call for a brief description of a program. This package may be used and distributed freely for NON-COMMERCIAL, NON-INSTITUTIONAL, PRIVATE purposes, provided it is not changed in any way. ┌────────────────────────────────────────────────────────────────┐ │ For ANY other usage (such as use in a business enterprise or a │ │ university) or the full scale version contact the author for a │ │ personal or a site license. │ └────────────────────────────────────────────────────────────────┘ Please do not distribute any part of this package separately. Uploading to BBSes is encouraged. The registered version is strictly for the registrant only. Identical programs must NOT be running on more than one computer at a time. Site licensed programs must not be run outside the licensed site. The programs are under development. Comments and contacts are solicited. If you have any questions, please do not hesitate to use electronic mail for communication. InterNet address: ts@uwasa.fi (preferred) Bitnet address: SALMI@FINFUN The author shall not be liable to the user for any direct, indirect or consequential loss arising from the use of, or inability to use, any program or file howsoever caused. No warranty is given that the programs will work under all circumstances. Timo Salmi Professor of Accounting and Business Finance Faculty of Accounting and Industrial Management University of Vaasa P.O. BOX 297, SF-65101 Vaasa, Finland 2. GENERAL DESCRIPTION OF STATLADR (Ver. 1.3) STATistics: Least Absolute Deviation multiple REGRession analysis is part of the interactive statistical system by Timo Salmi. It is the fifth program in the set. The first program in the set is STATistical MEASures (STATMEAS in TS1STxx.ZIP), which is intended for univariate analysis. The second program in the set is STATistics: multiple REGRession analysis (TS2STxx.ZIP). The third program in the set is STATistics: TRANsformations (STATTRAN in TS3STxx.ZIP), which can be used for transforming the observations, and, if necessary, also as an editor. The fourth program in the set is STATistics: Ranks and CORrelations (STATRCOR in TS4STxx.ZIP). STATLADR includes a handy built-in help system, which can be invoked by typing ? at any interactive question. Because of this built-in help, and the interactive nature of the program's user interface, no long-winding instructions have been included. (Who reads instructions anyhow?) The program performs least absolute deviation (LAD) multiple regression analysis, that is, estimates the coefficients of Y = a + b(1)X(1) + ... + b(M)X(M) from a set of observations. Whereas in ordinary least squares estimation (OLS) the sum of squared deviations between the observations and the regression equation is minimized, in LAD estimation the sum of the absolute deviations between the observations and the regression equation is minimized. Least absolute deviation multiple regression is thus equivalent to the following linear goal programming programming problem: n Min Sum (Pj + Nj) j=1 subject to ┌────┬─ absolute deviation n │ │ a + Sum x(i,j)b(i) + Pj + Nj = y(j) j=1 │ │ └─ explaining variables └─ dependent variable STATLADR finds the estimates of the intercept [a] and the regression coefficients [b(i)] by solving this linear goal programming problem. If the explaining variables are very similar (multicollinearity), problems tend to occur both in OLS and LAD regression estimation, and the estimates become very unstable. Further problems of significance can arise if the values of the explaining variables are of a very different scale. To test the reliability of the solution algorithm to inaccuracy indexes are computed and displayed. These are called the NON-OPTIMALITY OF THE LP SOLUTION and INACCURACY OF THE LP SOLUTION. The nearer to zero these figures, the less probability of computationally weak estimates. Although seldom reported, these problems are inherent to most (even the top commercial) statistics packages. For those in the know, the former index is the sum of positive coefficients in the optimal simplex-tableau. Mathematically they all are non-positive, but round-offs may cause some of them remain small positive numbers. The latter is based on the recalculating the optimal simplex-tableau on from the inverse of the basis matrix, and calculating the deviation of each item in the recalculated optimal simplex-tableau as compared with the original optimal simplex tableau. The inaccuracy indexes are calculated as a so called norm, that is the square root of the sum of the squared deviations. This measure is used because mathematically it represents the length of the deviation vector. Furthermore, STATLADR draws both low-resolution and high-resolution scatter diagrams of the data, and of the regression analysis results. The low-resolution scatter diagrams are drawn, or rather written, using ordinary ascii text, and they can thus be directed to a file. The high-resolution (graphics) scatter diagrams can only be displayed on the screen. The data can either be given from the keyboard or taken from a file. If the input is to be taken from a file it must first be prepared with some editor, or some word processor which includes an option for preparing ordinary ascii text. (Also STATTRAN can be used for this purpose.) The data is given to the program in the following format: X1 X2 X3 !variable names (! denotes a comment) 3.56 6.32 -1.73 5.12 -4.21 9.18 14.2 5.11 0.31 END !END is optional in a file A missing item in an observation is marked by a hash (#). E.g. if the first item of the second observation were missing, the observation should be written as # -4.21 9.18 The items in an observation can be separated with blanks, as in the above, or with commas (,) e.g. 5.12,-4.21,9.18. The number of the intervening blanks is irrelevant, and can be customized for increased readability. Thus e.g. 5.12 -4.21 9.18 and 5.12 -4.21 9.18 are equivalent. A row can be continued using an ampersand (&). E.g. the variables could be given as X1 X2 & X3 Alternatively, * or \ can be used instead of & as the continuation marker. Comments can be added to the input data. If ! appears on a line all text after ! will be considered as a comment. A header can be entered on each page if output is directed to a file. To accomplish this start the very first line on the input file with a double exclamation mark (!!) and the rest of the line will be used as the header. Thus !! indicates a header, a single ! an ordinary comment. The maximum number of variables is 25. The maximum number of observations is 100 (for each variable). The public domain version, however, sets the limits at 4 and 50 respectively. 3. STANDARD ERRORS AND GOODNESS OF FIT STATISTICS This chapter describes the formulas of the new features that were added to statladr.exe in the updated version 1.1. This chapter has been written by Seppo Pynnönen. The standard errors of the estimates of the regression coefficients are calculated as jj std(b) = s * X'X , where X is the n x (M+1) data matrix of x variables with vector of jj ones in the first column, X'X denotes the j:th diagonal element of the inverse of the X'X-matrix and the prime (') stands for the transpose, s is an estimate of the standard error of the residual terms of the regression model. (n stands for the number of observations, and M for the number of explanatory variables.) Here we have defined the standard error (s) of the residuals as 1 s = ------, 2f(m) where 2d f(m) = ------------------- n(e - e ) (m+d) (m-d) with d defined below, e denote the ordered residuals, and m is (j) the median point of the ordered residuals. The parameter d depends on the sample size. In the literature it is suggested that it should be kept small. Here we have adopted the following convention and defined d as d = max[1, n'/6], where n' = n-M-1 (i.e., the number of residuals which are not zero by definition due to the LP-solution). The t-values are defined as b(j)/std(b) (j = 0, 1, ..., M, with b(0), the intercept term), where std(b) is defined in the previous paragraphs. The LAD coefficient of determination is defined as Sum |e(i)| i LAD COEFFICIENT OF DETERMINATION = 1 - ------------------ Sum |y(i) - Md(y)| i (cf. the R-square in the OLS-regression), where Md(y) is the median of y. 4. RELEASE NOTES Version 1.1: The most important inprovements were described in the previous chapter. Furthermore, I have corrected a bug, which decreased the maximum capacity of the program by one observation. Some stylistic minor imporvements have also been made. Version 1.2: Several improvements to the nuts and bolts of the user interface. The new usage of the call is PROGNAME [/h(elp)] [/iInputFileName] [/oOutputFileName] [/cColumnsPerRow] (the /c option, which regulates the width of the output, is for registered versions, only). If you use the /i switch, it stuffs the InputFileName into the appropriate recall buffer. This means that when the program asks you for the input file name, you can invoke the input file name just by pressing the CursorUp key. (The same goes for the /o switch, respectively.) This is very convenient, if you use the program many times successively making small changes in your data in between. (This assumes, of course, that you have a command line editor like DOSEDIT or CED to recall previous MsDos commands. These common shareware programs can be obtained from any well-stocked BBS or FTP site.) The printer readiness test has been rewritten to be more general. The earlier test failed for some printers, because the codes the printers send when they are offline are not standardized. The "file exists, overwrite?" question is no more asked when the output file is prn, in other words when the output is directed to the printer. The user has now a choice of a left margin from 0 to 20 blanks when output is directed to the printer. The user has now a choice between formfeed and four blank lines to start each new page of output. When an input file is not found, the user is given the choice of listing a directory. The directory routine has been rewritten. The file ready message now also includes the file side besides the name. Version 1.3: The input and output file names can be optionally given as parameters in the program calls, e.g. STATLADR /ic:\stat\test.dat /or:\tmp This option has been improved. The "prefilled" name (e.g. c:\stat\test.dat) will now automatically appear on the input line without the need of pressing the cursor up key. All you need to do is to press enter. Also made some minor internal changes not worth recording. Rewrote the document files using a 68 column wrap instead of the former 80 to make the text easier to read and handle. Added the list of files in the package to the documentation. 5. LIST OF THE FILES IN THE PACKAGE TS5ST Statistics by T.Salmi. Part V Filename Comment -------- -------------------------------- FILE_ID.DIZ Brief characterization of TS5ST STATLADR.EXE Least abs. deviation regression STATLADR.INF Document STATLADR.NWS News announcements about TS5ST ---- ------ ------ ----- 0004