home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.barnyard.co.uk
/
2015.02.ftp.barnyard.co.uk.tar
/
ftp.barnyard.co.uk
/
cpm
/
walnut-creek-CDROM
/
CPM
/
TURBOPAS
/
MAPSTATF.LBR
/
MAPSTAT.DZC
/
MAPSTAT.ÄOC
Wrap
Text File
|
2000-06-30
|
13KB
|
277 lines
MULTIVARIATE ANALYSIS PACKAGE 1.6
Copyright 1985,86,87 Douglas L. Anderton
Department of Sociology
University of Chicago
1126 E. 59th Street
Chicago, IL 60637
These programs are released for distribution so long as 1) any
charges involved do not exceed costs of media and mailing, and 2) no
portion of the programs is used for commercial resale.
Revision History:
01/12/87 - added codebooks for variable names and missing values see
usage documentation below.
01/10/87 - major optimization in FACTOR eigen subroutines cut iter-
ations by about 40% and gave user control of tolerance.
01/09/87 - minor revisions DESCRPT, PLOT, CORREL, PARTIAL, CLUSTER,
HYPOTHS and MANOVA.
01/08/87 - fixed rollover bug in grand totals in CROSSTAB, and minor
optimization.
01/07/87 - substantially optimized TRANSFRM for 16% speed increase.
01/05/87 - fixed bug in REGRESS mean squares, converted to gaussian
and LU solution. Modified GETCOR subroutine to get names.
08/25/86 - Modified TRANSFRM to allow leading minus signs on number
entry and numbers up to 11 characters long.
06/25/86 - Fixed bug in group option and histograms in DESCRPT.
05/27/86 - New Release. Buffering added to TRANSFRM, MANOVA program,
Simple 2-dimensional PLOT, and Kmeans CLUSTERing program.
04/21/86 - Added Spicer algorithm and weighted data to CORREL.
04/19/86 - Added (improved accuracy) Spicer algorithm, weighted data
and 'by' group computations to DESCRPT.
03/23/86 - Fixed IFS bug in TRANSFRM and Sped it up considerably.
09/27/85 - Fixed Critical bugs in CORREL with missing values.
09/24/85 - New Release. Transformations Package, Partial Correl-
ations, Factor Analysis and Hypotheses Tests.
09/13/85 - Fixed bug which dropped sign of correlations from CORREL
when read into REGRESS if negative.
06/28/85 - Fixed bug in CROSSTAB (unidimensional addressing).
06/26/85 - Fixed bug in CROSSTAB (init row and col tots).
06/22/85 - New Release. CROSSTAB.
06/15/85 - First Release. DESCRPT, CORREL, REGRESS.
INTRODUCTION:
Mapstat is a very serious multivariate statistical analysis package
capable of meeting 90% or more of most users analytical needs. The
routines have, at this point, been well tested and provide the most
frequently used procedures of the relatively expensive statistical
packages without cost. Source code is included for modifications
and elaborations at your own risk.
Eleven programs are included in this sixth release of MAP.
1) DESCRPT - descriptive statistics and frequency histograms.
2) CORREL - correlation and covariance matrices.
3) REGRESS - multiple linear regression.
4) CROSSTAB - n-way crosstabulation and association tests.
5) TRANSFRM - data transformations.
6) HYPOTHS - simple hypotheses test on means and variances.
7) PARTIAL - partial correlation coefficients.
8) FACTOR - principle axis factoring with rotations.
9) CLUSTER - kmeans clustering program.
10) PLOT - simple 2 dimensional plots
11) MANOVA - multiple dependent variable analysis of variance
Users are encouraged to REPORT BUGS and make REQUESTS for future
versions. Do not release your own versions or modifications using
the copyrighted MAP or MAPSTAT logos - and abide by the above
copyright notice.
HARDWARE REQUIREMENTS:
MAP is written in version 2 (or 3) of Turbo Pascal (@Borland Intl).
It has been written to compile with less than 56k TPA for those
running ZCPR3 or an alternative OS on 8-bit machines.
Only several statements must be altered to run the programs on MSDOS
machines. Change BDOS(0) calls to EXIT and try to compile. As I
recall only two or three other lines need to be changed out of all
the code herein for MSDOS version 3 Turbo.
PLOT contains printer control codes for the EPSON MX80 in procedure
Openfiles, modify these codes to suit your printer.
DESIGN PHILOSOPHY:
First, MAP is written as a sequential case processor to avoid memory
resident storage and achieve the greatest speed possible. This has
several consequences 1) the package contains powerful statistical
analysis programs without horrendous memory requirements, 2)
however, the cost arises in that for redundant functions such as
histograms, regression residuals, etc., the package currently
requires multiple passes at the data. Even for large data sets the
programs are sufficiently fast to make such passes reasonable.
INPUT DATA REQUIREMENTS:
MAP expects to find your data in a free format with at least one
blank separating each variable and a newline at the end of each
line. All variables for each case must be on a single line, i.e.
newlines separate records. It will not accept alphanumeric data.
Programs assume all data transformation has been performed (e.g.
CROSSTAB expects a finite number of values, not necessarily integer
value). These are the only data requirements.
Codebook files containing variable names and missing values are also
allowed, see 'running the programs' below.
COMPILING THE PROGRAMS:
Use your Turbo Pascal (@Borland Intl) compiler to compile the
programs with the options set to a .COM file for MAPSTAT and a .CHN
file for all others. Rename all except MAPSTAT to the names given
in the file MAPSTAT.PAS. If you plan to run these programs under
ZEX control (highly reccommended) then be sure to compile them under
ZEX. This is done putting all the distribution files along with your
turbo compiler in a common access area and running the MINSTALL.ZEX
file included. Alternatively, to compile one at a time, but under
zex, just enter:
>ZEX
:TURBO
:<carriage return>
and then proceed as you normally would.
RUNNING THE PROGRAMS:
1. Data Input and Output Files -
After invoking the programs they will ask for the name of an input
data file (or a file created from a prior MAP run - for example, the
output of CORREL is used by REGRESS), and the name of an output
file. For printer output specify the filename as LST: and for
screen output specify CON:. An exception is TRANSFRM, which uses
buffered output routines will accept LST: and CON: but will send
output to LIST.TMP and CONSOLE.TMP disk files respectively.
2. Codebook Variable Description Files -
If the input to the program is raw data (i.e. it is not one of the
procedures which input a prior CORREL matrix), then the program will
ask for a codebook file. The codebook file contains three items of
input for each variable in the data file (1) the column number, (2)
a variable name of eight characters, and (3) a missing value code
for missing values. Again, I repeat, one line must be provided for
each variable in the data file (whether it is used in this
particular analysis or not). All three items must be provided for
each variable on a new line and separated by blanks. For example,
1 THISIS1 -9
2 HERESTWO -1E37
(etc.)
Note that eight spaces must be allowed for variable names, leave
blanks if necessary to fill out the string. Note also that a
missing value code must be given for every variable. The example
above used MAPSTAT's default value of -1E37 for missing data, this
or another equally implausible value may be given in the codebook.
Alternatively, if the user specifies 'none' in answer to the
codebook file query, variable names will default to variable numbers
and the default missing value will be assumed. This is not a
recommended option if you will return to your output sometime in the
future.
3. Variable Column Identification -
After file names the programs will typically request the number of
variables in the data file and then the number of variables to be
used in the present run. For example, a CORREL run might be run on
a file containing lines for 500 cases each with 12 variables, only 4
of which are to intercorrelated in the present run. The total
number of variables would then be entered as 12 and the number for
the present run as 4.
For each variable to be used the program will request information on
the column number of the variable (e.g. 1 for the first variable, 2
for the second, etc.). These are column numbers in the raw file not
among the subset to be used. In the above example, say the first,
third, sixth, and eleventh of the 12 variables were to be used, the
user would enter 1<RETURN> 3<RETURN> 6<RETURN> and 12<RETURN>.
4. Specification of Groups, Weights and Special variables -
Occasionally, the programs will ask you to identify one of the
variables for use in weighting data, grouping data, as a dependent
variable, etc. Again, reference is by original column number of the
input data set. For example, if the correlations in the example
above were to be weighted by population which is contained as the
sixth variable, you would identify the weight as column 6, it's
position in the raw data file. All of the variables used as
weights, groups, etc., must have been included in the original
number of variables to use and selection of the columns for the
analysis. That is, it would not be possible to specify, for
example, column 4 as a weight since it has not been specified in the
variable list above.
5. Hints on Further Documentation -
All other information necessary is prompted for with what I hope are
explicit prompts. If you have problems as to input queries, or the
interpretation of output, refer to a statistics book. Some of the
multivariate routines are recognizably influenced by those in
Fortran by Cooley and Lohnes in their Multivariate Data Analysis
book. The Kmeans clustering routine is found in almost any book on
cluster analysis. Some routines lifted from numerical methods
books, etc., have references in the source code. The transformation
options are relatively well elaborated if you initially specify to
input transformations for the CON: file. Once you become familiar
with the program you can input transformations from files.
Finally, I am eminently reachable for the near present at the BBS
number at the end of this file. If you have any questions regarding
interpretation, etc., feel free to give me a line.
6. Hints on Power Usage -
There are a number of features which the design philosophy of
mapstat preclude. However, most of these features are readily
derivable through coupling TRANSFRM with the other programs.
For example, many regression packages output residuals from the
regression and plots of the standardized residuals, etc. Mapstat
does not force such a second pass through the data since it is
designed for large data sets without retention of the data in
memory. If the user desires such an analysis the residuals could be
readily computed using TRANSFRM and then plotted with PLOT.
Similarly, FACTOR produces score coefficients which could be used to
generate factor scores for further analysis etc.
Dummy variables can be coded through use of the recoding facilities
in TRANSFRM and used to compute complicated general linear model
analyses of variance (GLM/ANOVA's) through REGRESS.
The list goes on, and on, and on. The more you know about
statistics and what you are doing the more you will find these
programs of use. At the same time, if you are a basic user you will
probably not require more than the basic output provided by
routines.
PROGRAM LIMITATIONS:
The addition of codebooks and transformation files makes these
routines roughly competitive with other micro statistics packages.
Given you have recieved them free of cost and, "omigosh," with the
source code, they are extremely flexible and useful tools for data
analysis.
Both DESCRPT and CORREL now allow weighted data to be entered.
While the Spicer algorithm provides good accuracy on computations in
both these programs it is not as robust against weighted data. The
results are sufficent for most purposes but excercixe caution with
heavily weighted data.
At this stage with humble documentation it is up to the user to look
at the beginning type and variable declarations to see what the
limitations on the number of variables, etc., of each program are.
I think if you are doing any REAL data analysis you will find the
provisions ample.
I have relied almost exclusively upon these routines in several
analyses published over the last couple of years and they have been
scrutinized by a number of graduate students and colleagues. While
I can't guarantee any revision won't create some obscure bug, I can
assure you there are no subtle bugs of any significance for regular
data analysis. As with all statistical software, you should avoid
absurd or extreme value input.
Leave messages on the LILLIPUTE ZNODE (312-649-1730).