home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The World of Computer Software
/
World_Of_Computer_Software-02-385-Vol-1of3.iso
/
t
/
ts5st13.zip
/
STATLADR.INF
< prev
next >
Wrap
Text File
|
1992-09-22
|
13KB
|
290 lines
Tue 22-September-1992 (All rights reserved)
About TS5ST in General (Least absolute deviation multiple regression)
======================
Contents:
1. Introduction
2. General description of statladr
3. Standard errors and goodness of fit statistics
4. Release notes
5. List of the files in the package
1. INTRODUCTION
Apply question mark ? with the program call for a brief description
of a program.
This package may be used and distributed freely for NON-COMMERCIAL,
NON-INSTITUTIONAL, PRIVATE purposes, provided it is not changed in
any way.
┌────────────────────────────────────────────────────────────────┐
│ For ANY other usage (such as use in a business enterprise or a │
│ university) or the full scale version contact the author for a │
│ personal or a site license. │
└────────────────────────────────────────────────────────────────┘
Please do not distribute any part of this package separately.
Uploading to BBSes is encouraged.
The registered version is strictly for the registrant only.
Identical programs must NOT be running on more than one computer at
a time. Site licensed programs must not be run outside the licensed
site.
The programs are under development. Comments and contacts are
solicited. If you have any questions, please do not hesitate to use
electronic mail for communication.
InterNet address: ts@uwasa.fi (preferred)
Bitnet address: SALMI@FINFUN
The author shall not be liable to the user for any direct, indirect
or consequential loss arising from the use of, or inability to use,
any program or file howsoever caused. No warranty is given that the
programs will work under all circumstances.
Timo Salmi
Professor of Accounting and Business Finance
Faculty of Accounting and Industrial Management
University of Vaasa
P.O. BOX 297, SF-65101 Vaasa, Finland
2. GENERAL DESCRIPTION OF STATLADR (Ver. 1.3)
STATistics: Least Absolute Deviation multiple REGRession analysis
is part of the interactive statistical system by Timo Salmi. It is
the fifth program in the set. The first program in the set is
STATistical MEASures (STATMEAS in TS1STxx.ZIP), which is intended
for univariate analysis. The second program in the set is
STATistics: multiple REGRession analysis (TS2STxx.ZIP). The third
program in the set is STATistics: TRANsformations (STATTRAN in
TS3STxx.ZIP), which can be used for transforming the observations,
and, if necessary, also as an editor. The fourth program in the set
is STATistics: Ranks and CORrelations (STATRCOR in TS4STxx.ZIP).
STATLADR includes a handy built-in help system, which can be
invoked by typing ? at any interactive question. Because of this
built-in help, and the interactive nature of the program's user
interface, no long-winding instructions have been included. (Who
reads instructions anyhow?)
The program performs least absolute deviation (LAD) multiple
regression analysis, that is, estimates the coefficients of
Y = a + b(1)X(1) + ... + b(M)X(M)
from a set of observations. Whereas in ordinary least squares
estimation (OLS) the sum of squared deviations between the
observations and the regression equation is minimized, in LAD
estimation the sum of the absolute deviations between the
observations and the regression equation is minimized. Least
absolute deviation multiple regression is thus equivalent to the
following linear goal programming programming problem:
n
Min Sum (Pj + Nj)
j=1
subject to
┌────┬─ absolute deviation
n │ │
a + Sum x(i,j)b(i) + Pj + Nj = y(j)
j=1 │ │
└─ explaining variables └─ dependent variable
STATLADR finds the estimates of the intercept [a] and the regression
coefficients [b(i)] by solving this linear goal programming problem.
If the explaining variables are very similar (multicollinearity),
problems tend to occur both in OLS and LAD regression estimation,
and the estimates become very unstable. Further problems of
significance can arise if the values of the explaining variables are
of a very different scale. To test the reliability of the solution
algorithm to inaccuracy indexes are computed and displayed. These
are called the NON-OPTIMALITY OF THE LP SOLUTION and INACCURACY OF
THE LP SOLUTION. The nearer to zero these figures, the less
probability of computationally weak estimates. Although seldom
reported, these problems are inherent to most (even the top
commercial) statistics packages. For those in the know, the former
index is the sum of positive coefficients in the optimal
simplex-tableau. Mathematically they all are non-positive, but
round-offs may cause some of them remain small positive numbers. The
latter is based on the recalculating the optimal simplex-tableau on
from the inverse of the basis matrix, and calculating the deviation
of each item in the recalculated optimal simplex-tableau as compared
with the original optimal simplex tableau. The inaccuracy indexes
are calculated as a so called norm, that is the square root of the
sum of the squared deviations. This measure is used because
mathematically it represents the length of the deviation vector.
Furthermore, STATLADR draws both low-resolution and
high-resolution scatter diagrams of the data, and of the regression
analysis results. The low-resolution scatter diagrams are drawn, or
rather written, using ordinary ascii text, and they can thus be
directed to a file. The high-resolution (graphics) scatter diagrams
can only be displayed on the screen.
The data can either be given from the keyboard or taken from a
file. If the input is to be taken from a file it must first be
prepared with some editor, or some word processor which includes an
option for preparing ordinary ascii text. (Also STATTRAN can be used
for this purpose.)
The data is given to the program in the following format:
X1 X2 X3 !variable names (! denotes a comment)
3.56 6.32 -1.73
5.12 -4.21 9.18
14.2 5.11 0.31
END !END is optional in a file
A missing item in an observation is marked by a hash (#). E.g. if
the first item of the second observation were missing, the
observation should be written as # -4.21 9.18
The items in an observation can be separated with blanks, as in
the above, or with commas (,) e.g. 5.12,-4.21,9.18. The number of
the intervening blanks is irrelevant, and can be customized for
increased readability. Thus e.g. 5.12 -4.21 9.18
and 5.12 -4.21 9.18 are equivalent.
A row can be continued using an ampersand (&). E.g. the variables
could be given as
X1 X2 &
X3
Alternatively, * or \ can be used instead of & as the continuation
marker.
Comments can be added to the input data. If ! appears on a line
all text after ! will be considered as a comment.
A header can be entered on each page if output is directed to a
file. To accomplish this start the very first line on the input file
with a double exclamation mark (!!) and the rest of the line will be
used as the header. Thus !! indicates a header, a single ! an
ordinary comment.
The maximum number of variables is 25. The maximum number of
observations is 100 (for each variable). The public domain version,
however, sets the limits at 4 and 50 respectively.
3. STANDARD ERRORS AND GOODNESS OF FIT STATISTICS
This chapter describes the formulas of the new features that were
added to statladr.exe in the updated version 1.1. This chapter has
been written by Seppo Pynnönen.
The standard errors of the estimates of the regression
coefficients are calculated as
jj
std(b) = s * X'X ,
where X is the n x (M+1) data matrix of x variables with vector of
jj
ones in the first column, X'X denotes the j:th diagonal element of
the inverse of the X'X-matrix and the prime (') stands for the
transpose, s is an estimate of the standard error of the residual
terms of the regression model. (n stands for the number of
observations, and M for the number of explanatory variables.) Here
we have defined the standard error (s) of the residuals as
1
s = ------,
2f(m)
where
2d
f(m) = -------------------
n(e - e )
(m+d) (m-d)
with d defined below, e denote the ordered residuals, and m is
(j)
the median point of the ordered residuals. The parameter d depends
on the sample size. In the literature it is suggested that it should
be kept small. Here we have adopted the following convention and
defined d as
d = max[1, n'/6],
where n' = n-M-1 (i.e., the number of residuals which are not zero
by definition due to the LP-solution).
The t-values are defined as b(j)/std(b) (j = 0, 1, ..., M, with
b(0), the intercept term), where std(b) is defined in the previous
paragraphs.
The LAD coefficient of determination is defined as
Sum |e(i)|
i
LAD COEFFICIENT OF DETERMINATION = 1 - ------------------
Sum |y(i) - Md(y)|
i
(cf. the R-square in the OLS-regression), where Md(y) is the median
of y.
4. RELEASE NOTES
Version 1.1: The most important inprovements were described in
the previous chapter.
Furthermore, I have corrected a bug, which decreased the maximum
capacity of the program by one observation.
Some stylistic minor imporvements have also been made.
Version 1.2: Several improvements to the nuts and bolts of the
user interface.
The new usage of the call is
PROGNAME [/h(elp)] [/iInputFileName] [/oOutputFileName] [/cColumnsPerRow]
(the /c option, which regulates the width of the output, is for
registered versions, only). If you use the /i switch, it stuffs the
InputFileName into the appropriate recall buffer. This means that
when the program asks you for the input file name, you can invoke
the input file name just by pressing the CursorUp key. (The same
goes for the /o switch, respectively.) This is very convenient, if
you use the program many times successively making small changes in
your data in between. (This assumes, of course, that you have a
command line editor like DOSEDIT or CED to recall previous MsDos
commands. These common shareware programs can be obtained from any
well-stocked BBS or FTP site.)
The printer readiness test has been rewritten to be more general.
The earlier test failed for some printers, because the codes the
printers send when they are offline are not standardized.
The "file exists, overwrite?" question is no more asked when the
output file is prn, in other words when the output is directed to
the printer.
The user has now a choice of a left margin from 0 to 20 blanks
when output is directed to the printer.
The user has now a choice between formfeed and four blank lines
to start each new page of output.
When an input file is not found, the user is given the choice of
listing a directory. The directory routine has been rewritten.
The file ready message now also includes the file side besides
the name.
Version 1.3: The input and output file names can be optionally
given as parameters in the program calls, e.g.
STATLADR /ic:\stat\test.dat /or:\tmp
This option has been improved. The "prefilled" name (e.g.
c:\stat\test.dat) will now automatically appear on the input line
without the need of pressing the cursor up key. All you need to do
is to press enter.
Also made some minor internal changes not worth recording.
Rewrote the document files using a 68 column wrap instead of the
former 80 to make the text easier to read and handle. Added the list
of files in the package to the documentation.
5. LIST OF THE FILES IN THE PACKAGE
TS5ST Statistics by T.Salmi. Part V
Filename Comment
-------- --------------------------------
FILE_ID.DIZ Brief characterization of TS5ST
STATLADR.EXE Least abs. deviation regression
STATLADR.INF Document
STATLADR.NWS News announcements about TS5ST
---- ------ ------ -----
0004