home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Monster Media 1993 #2
/
Image.iso
/
math
/
mvsp21.zip
/
MVSP.DOC
< prev
next >
Wrap
Text File
|
1993-06-08
|
126KB
|
2,353 lines
_______________________________________________________
MMMMMMMMMMMM VV VV SSSSSSSS PPPPPPPP
MM MM MM VV VV SS PP PP
MM MM MM VV VV SSSSSSSS PPPPPPPP
MM MM MM VV VV SS PP
MM MM MM * VVV * SSSSSSSS * PP *
_______________________________________________________
S H A R E W A R E
-----------------
A MultiVariate Statistics Package
for the IBM PC and Compatibles
(C) Copyright Warren L. Kovach, 1986-1993
Kovach Computing Services
85 Nant-y-Felin
Pentraeth, Anglesey LL75 8UY Wales U.K.
Internet: warrenk@cix.compulink.co.uk
CompuServe: 100016,2265
Ver. 2.1, June, 1993
This program is being distributed as shareware. You may evaluate
it for up to 30 days. If after that period you decide to
continue using the program you must register. This costs 65 UK
pounds or the equivalent in US dollars. See page 4 of this
manual, the file REGISTER.DOC, or the "Register" option on the
main menu for more details.
MVSP Ver. 2.1 -- Users Manual Page 2
ACKNOWLEDGEMENTS
In the years since I first released MVSP, I have received
countless letters about this program, many with some very useful
suggestions and comments. I have considered all of these and
have incorporated most into this new version. My thanks go to
all of those who have sent in comments. Special thanks go to
John Birks (Bergen, Norway), Geoffrey King (Pickering, Yorkshire,
England), Lou Maher (Madison, Wisconsin, USA), John Breen
(Limerick, Ireland), and Bill Briggs (Boulder, Colorado, USA) for
numerous comments on both the old and new versions of the
program. Very special thanks go to my wife, Catherine Duigan,
for numerous suggestions for improvements in the program, help in
designing this manual and the cover, assistance in the
distribution of MVSP, and for putting up with many hours of
computer-widowhood.
Warren L. Kovach
"Tigh an-Oilean"
Pentraeth, Anglesey, Wales
June 1993
This manual and the accompanying program are protected by
international copyright laws; (C) Copyright 1986-1993 Dr. Warren
L. Kovach. This manual and the accompanying computer program may
not be reproduced except as outlined in the section below
entitled "Limited User Licence".
MVSP Ver. 2.1 -- Users Manual Page 3
TABLE OF CONTENTS
Acknowledgements................................................2
Introduction....................................................4
The Shareware Concept...........................................4
Limited Warranty................................................5
General Use of Program..........................................6
Starting the program..........................................6
Menus.........................................................6
Entering and Editing Text.....................................7
Menu Options....................................................7
A-F: Statistical Procedures...................................7
M: Manipulate Data............................................7
I: Import/Export..............................................8
S: Change Drive or Sub-directory..............................8
Q: Quit MVSP..................................................8
X: Execute DOS commands.......................................8
P: Change Program Defaults....................................8
Screen Colors..............................................8
Data File and Work File Path...............................8
Data File Extension........................................9
Output Format.............................................10
Graphics Options..........................................11
Printer Setup.............................................13
MVSP Data Editor...............................................14
Entering Data Labels.........................................15
Entering Data................................................15
Editing Labels and Data......................................16
Saving Data Matrix...........................................16
Data File Format...............................................16
Data Manipulation..............................................19
Import/Export Data.............................................22
Running Numerical Procedures...................................23
Principal Components Analysis................................24
Principal Coordinates Analysis...............................26
Correspondence Analysis......................................26
Distances and Similarities...................................29
Cluster Analysis.............................................32
Diversity Indices............................................34
Utilities......................................................35
Sortdata.....................................................35
Disclaimer.....................................................36
80x87 Support..................................................37
Protected Mode Version.........................................37
Appendices.....................................................38
References.....................................................39
Other Products from Kovach Computing Services..................41
MVSP Ver. 2.1 -- Users Manual Page 4
INTRODUCTION
MVSP is a package of common multivariate statistical procedures
widely used in many areas of biology and geology, as well as
other fields. These procedures include principal components
analysis (PCA), principal coordinates analysis (PCO),
correspondence analysis (CA; also called reciprocal averaging),
distance or similarity measures, hierarchical cluster analysis,
and diversity indices. MVSP provides a great deal of flexibility
in the analyses, but is simple to use. Options for different
forms of these analyses can be chosen from menus and these
settings can be saved for future use. Most analyses can be run
with as few as half a dozen keystrokes.
One possible drawback to ease of use is that some users may be
very tempted to take a "black box" approach to using these
statistics, feeding in numbers and coming up with "The Answer".
I must strongly warn the users of this program that statistics
can be DANGEROUS! All these procedures make assumptions about
the data and have restrictions on what they can and cannot do. If
these assumptions and restrictions are violated, the results
could be meaningless. I urge you to become familiar with the
methods before you use this program. This manual contains a list
of references that I have found very useful in understanding
these techniques. In particular, Sneath & Sokal (1973), Gauch
(1982), Pielou (1984), Manly (1986), Davis (1986), and Kent and
Coker (1992) are very well written and give very clear
discussions of these techniques.
I am always interested to see how MVSP is being used. I would
appreciate receiving reprints of any papers you have published in
which MVSP was used for data analysis. Thank you!
THE SHAREWARE CONCEPT
This software package is being distributed under the shareware
concept. In case you haven't run across this software
phenomenon, the following is a brief discussion of it's tenets.
Shareware software is an experiment in "grass-roots" software
distribution and development. Andrew Fluegelman, one of the
pioneers of this phenomenon in the microcomputer world, expressed
it this way:
1) The value and utility of software is best assessed by the
user on his or her own system, under actual working
conditions.
2) The creation of new and useful software should be supported
by the computing community.
3) Copying and sharing of software that you have found useful
should be encouraged, rather than restricted.
Shareware programs are freely distributed to the computing
community, through the network of electronic bulletin board
services, local computer user groups, shareware disk vendors, and
MVSP Ver. 2.1 -- Users Manual Page 5
networks of friends and colleagues with similar interests. You
are allowed to try out the program for a certain period to see if
it fits your needs. If it does and you intend to continue using
it, then you must register the program with the author by paying
a registration fee. In return you will generally get a copy of
the latest version of the program, a printed manual, and perhaps
other extras the author offers to encourage you to register.
Shareware means that you don't have to pay outrageous prices for
a program without getting a chance to test drive it first to see
if it really meets your needs. Shareware means that if you
decide that this program is worth supporting, then you support it
voluntarily, for a reasonable cost, and without the hassles of
copy-protection and the high cost of advertising.
You are encouraged to copy and distribute MVSP Shareware. If
after a 30 day evaluation period you find this program to be
useful and decide to continue using it, then a registration fee
of 65 UK pounds or the equivalent in US dollars should be sent to
the author. See the file REGISTER.DOC, or the "Register" option
on the main menu for details on how to register.
In return for the contribution, you will receive:
o the latest version of the program (without the shareware
reminder messages)
o a full printed manual, including the graphics and appendices
that are not in the shareware version
o the ability to take advantage of the 80x87 math coprocessor for
faster and more accurate analyses
o a protected mode version that will directly use up to 16Mb of
RAM memory for faster analyses of larger data sets
o the SORTDATA utility that creates graphic representations of
your data matrices, sorted in the order of the dendrograms
o notification of future versions as well as of other programs
produced by Kovach Computing Services
o special upgrade prices
o technical support by phone, fax, e-mail or post
This program is copyrighted. MVSP Shareware can be freely copied
and distributed in accordance with the regulations specified in
the accompanying file VENDOR.DOC. MVSP Shareware may not be
modified or dis-assembled in any way or for any reason.
Distribution of modified versions are also forbidden.
LIMITED WARRANTY
Kovach Computing Services warrants any physical diskettes and
physical documentation provided under this agreement to be free
of defects in materials and workmanship for a period of sixty
days from the purchase.
KOVACH COMPUTING SERVICES SPECIFICALLY DISCLAIMS ALL OTHER
WARRANTIES OF ANY KIND, EXPRESSED OR IMPLIED, INCLUDING BUT NOT
LIMITED TO ANY WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
PARTICULAR PURPOSE.
MVSP Ver. 2.1 -- Users Manual Page 6
The total liability of Kovach Computing Services for any claim or
damage arising out of the use of the licensed program or
otherwise related to this licence shall be limited to direct
damages which shall not exceed the price paid for the program.
IN NO EVENT SHALL THE LICENSOR BE LIABLE TO THE LICENSEE FOR
ADDITIONAL DAMAGES, INCLUDING ANY LOST PROFITS, LOST SAVINGS OR
OTHER INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE
OF OR INABILITY TO USE THE LICENSED PROGRAM, EVEN IF LICENSOR HAS
BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
This agreement does not affect your statutory rights. The
agreement shall be interpreted and enforced in accordance with
and shall be governed by the laws of England and Wales.
GENERAL USE OF THE PROGRAM
Starting the program:
This program is simple to use and menu-driven, presenting you
with the possible options at each step. It is initiated by first
logging into the disk and directory containing the program (using
the DOS commands CD, A:, C:, etc.) and typing the name of the
program, "MVSPSHAR". For instance, if you have installed MVSP on
you hard disk in the directory C:\MVSP, type:
C:
CD C:\MVSP
MVSPSHAR
The program file MVSPSHAR.EXE must be in the default directory
(the one specified in the CD command above) for the program to
work properly. If you wish to use the help facility, the file
MVSP.HLP must also be in this directory. If you have changed any
of the program defaults, the configuration file named MVSP.CNF
(which is created when you save your changes) must also be on the
default drive for the new options to be reinstated.
You may also specify the location of the MVSP files using DOS
environment variables and the commands "SET" and "PATH". For
instance, if the MVSP files are in the directory C:\MVSP, you may
place the two following commands in your AUTOEXEC.BAT file:
PATH C:\;C:\MVSP (this line may also contain other directories)
SET MVSP=C:\MVSP
After rebooting, you may start the program by typing MVSP,
regardless of the current directory. You may edit your
AUTOEXEC.BAT file with any word processor or text editor that
produces plain text (ASCII) files. Many will have a special non-
document mode for this. Refer to your word processor manual for
details. Also, DOS' EDIT or EDLIN program may be used.
Menus:
When the program is loaded, you will see an introductory screen
giving the name of the author, then after pressing any key you
will be presented with a menu of available procedures. The first
MVSP Ver. 2.1 -- Users Manual Page 7
option on the menu will be highlighted by a rectangular cursor.
This cursor can be moved through the list of options by using the
up and down arrow keys. A choice is made by pressing the
carriage return when the correct one is highlighted, or
alternatively by typing the letter preceding the desired option.
Usually, choosing an option will bring up a second menu, from
which you can often call up a third, and so on. The number
preceding the title on each menu indicates the level you are at
in the hierarchy; If you get lost, remember that pressing 'Q' or
ESC will bring you back to the previous menu.
MVSP has an extensive help facility that provides information
about every menu option. To get help, just place the cursor on
the desired option and press the F1 key. After reading the text,
pressing any key will bring you back to the menu.
Entering and editing text:
You will often be asked to type in a string of text, such as the
name of a data file. In some cases you are provided with a
default choice, which you can accept or modify. MVSP has a
number of editing commands to help in this modification. You can
use the cursor keys to move the cursor back and forth, the DEL
and Backspace keys for deleting text, and the letter keys for
adding text. When you first begin editing a text string, the
program is in insert mode, so that any text you type will be
inserted and the remaining text will be pushed the right.
Pressing the INS key toggles insert mode on or off (indicated by
the thickness of the cursor); with it off old text is overwritten
by the new. Pressing ESC will clear the input line to allow you
to start from scratch. If you press the Enter key after clearing
the line you will exit that procedure.
When you are entering the name of the input data file, pressing
F3 will recall the last valid filename you entered during that
session. You may then use that file again or modify it if you
want to use a similarly named file.
MENU OPTIONS
The main menu lists the six available numerical procedures as
well as a few other options. It looks like this:
<Graphic placed here in printed manual>
Options A-F:
These options are the basic numerical procedures; principal
components analysis, principal coordinates analysis,
correspondence analysis (reciprocal averaging), similarities and
distances, cluster analysis, and diversity indices. These are
described later in this document.
Option M:
The MANIPULATE DATA option provides facilities for data entry,
editing, and transformation. A simple spreadsheet-like data
editor is provided for initial entry and subsequent modification
of the data. Procedures are also provided for transposing and
transforming the data, converting to other scales, and deleting
rows and columns. The full use of these facilities is described
MVSP Ver. 2.1 -- Users Manual Page 8
below.
Option I:
The IMPORT/EXPORT option allows you to transfer data between MVSP
files and other file formats. Currently Lotus 1-2-3/Symphony and
Cornell Ecology Program file formats are supported. The full use
of this option is described on page 8.
Option S:
This option, CHANGE DRIVE OR SUBDIRECTORY, allows you to specify
the default location of the input and output data files. If you
enter a path name without a drive specification, the default
drive is assumed. If you enter just a drive specification (e.g.
"A" or "A:") the default path will be the current directory of
that drive. A "?" lists the sub-directories of the current
directory. A carriage return with no other input exits this
option with no changes.
Option Q:
QUIT MVSP will exit the MVSP program and return to the DOS
prompt.
Option X:
The EXECUTE DOS COMMANDS option allows you to temporarily drop
out to (or "shell to") DOS while you are running MVSP. Rather
than exiting the program completely, this option allows you to
keep MVSP loaded in memory, with all you current options intact,
while you work at the DOS prompt. When you are ready to return
to MVSP, simply type the command "exit" at the DOS prompt.
When you shell to DOS, the running program of MVSP will be saved
to disk or EMS memory, allowing as much DOS memory to be freed up
as possible. On typing "exit" this saved image will be reloaded
and you will be returned to MVSP in the state it was when you
left.
Option P:
The CHANGE PROGRAM DEFAULTS option allows you to change many of
the default settings for the program. These specifications can
be saved to the file MVSP.CNF, which will be reloaded each time
the program is run, reinstating these defaults. When you choose
this option you will be presented with a menu asking which type
of default should be changed.
<Graphic placed here in printed manual>
C - SCREEN COLOURS allows you to change the colour of the regular
text and background, the menu text and background, the menu
frame, and the help screens and error messages. Choosing one of
these will cause a menu of available colours to appear. You can
experiment with colour combinations easily, quitting the colour
menu when you are satisfied. Note that option "F" on the menu
resets black and white colours. This option can be useful in
case you get yourself into a colour combination that is so
unreadable that you can't see the options available!
P - DATA FILE AND WORK FILE PATH changes the default path used
MVSP Ver. 2.1 -- Users Manual Page 9
for data files, just like option S above. If you are using a two
floppy disk system, it is often most useful to have the program
files in drive A: and to have the default data file path set to
B:, so that data files are on another disk. If you have a hard
disk, you could have the program files in a subdirectory named
C:\MVSP (which would be the default directory when you invoke the
program) and the data either on a floppy disk in drive A: or B:,
or in a hard disk directory named C:\MVSP\DATA. You would then
specify the default data file path through this option. You can
even set up separate directories for different types of data,
which is where the temporary path change option ("S" on main
menu) would come in handy. You can always override the default
path option by specifying the drive and path when you are asked
for the name of the data file while running one of the
statistical procedures.
After entering the default data file path, you will be asked for
a disk drive where the temporary work files will be stored. If
the data set you are analysing is too large to fit in memory,
parts of it will be stored on disk until needed. This will slow
down the calculations considerably, since data retrieval from a
disk is much slower than from memory. Floppy disks are much
slower than hard disks, so always choose a hard disk for the work
files if you have one available. If your computer has extended
or expanded memory (memory above 640K), then you can set this up
as a RAMdisk that will emulate a disk drive but operate much
faster, thus speeding up any analyses that must write data to
disk. See Appendix 4 (only included with the registered version)
for details of how to do this and general information on memory
management in MVSP.
E - DATA FILE EXTENSIONS allows you to change the default
extensions for your input and output files. The default values
are *.MVS for input files and *.OUT for output files, but you can
easily change this and save your changes. The PCO and cluster
analysis procedures can have different defaults, which
facilitates the input of similarity or distance coefficients.
The coefficients program will output a symmetrical matrix in the
form required by the PCO or cluster procedures, if so asked, and
will default to the extension that you specify for PCO and
cluster analysis input (*.MVD is the initial setting). The
output files for these can also have their own default extension
(*.OT2 initially). You can also specify default extensions for
the tree description and tree order files produced by cluster
analysis.
R - REREAD CONFIGURATION FILE will reread the MVSP.CNF
configuration file that contains the user default settings. This
will reinstate the default settings that are normally active when
the program is initiated. This can be handy if you have made a
lot of changes to defaults during a session (without saving
them!) and you wish to return to your old defaults.
S - SAVE DEFAULTS TO FILE MVSP.CNF will save any changes in the
defaults to a configuration file, which will be reloaded every
time the program is run. If this file is not found in the same
directory as the other MVSP program files, the internal defaults
MVSP Ver. 2.1 -- Users Manual Page 10
will be set.
Q - QUIT CONFIGURE will return to the main menu.
O - OUTPUT FORMAT allows you to change the format of the
printouts obtained from MVSP analyses as well as the method used
for writing to the video screen.
<Graphic placed here in printed manual>
P - The PAGE WIDTH option sets the number of characters that can
be printed per line on your printer. Normally this is 80
characters, but if you have a wide carriage printer or a printer
capable of compressed printing at 15 characters per inch, then
this can be reset to 130. The "Printer Setup" option described
below allows you to set your printer to print in compressed
mode. The "Page Width" option also affects the length of lines
in data files created by the "Data Manipulation" and "Distances
and Similarities" procedures.
C - RESULTS COLUMN WIDTH sets the number of characters used to
represent each number and column heading on the printout of the
results. With numbers, this column width is for the entire
number, including the decimal point, decimal fraction, and the
space between numbers. Thus " 2345.67" requires a column width
of at least 8 spaces, including a leading space. Narrower column
widths allow more columns to be printed across a page, thus
saving paper, but some numbers may be too large to be represented
in the smaller space. If a number is larger, the whole number
will be printed and the alignment of the columns will be
disrupted. Symmetrical matrices created by the "Distances and
Similarities" procedure also use the values specified here and in
option D.
D - RESULTS DECIMAL PLACES sets the number of decimal places to
be displayed for each number. Generally this should be at least
2 or 3. Whole numbers (those that have a decimal portion that is
zero to the accuracy of the computer) will be displayed without
the decimal portion. Numbers that are smaller than can be
represented in the allotted decimal places will be printed in
exponential form. For instance, if the decimal places option is
set to 3, and a number 0.00001 must be printed, it will be
printed as 1.0E-05 (1.0 x 10-5).
O & E - DATA COLUMN WIDTH and DATA DECIMAL PLACES are similar to
the above two options, but they apply only to printouts of the
raw data and to data files created by the "Data Manipulation"
procedure. For instance, if your data are always whole numbers
less than 100, then you could set the data decimal places to 0
and the data column width to 4.
M - SCREEN OUTPUT METHOD lets you toggle between two methods of
screen output, direct screen memory output and BIOS output. The
direct memory method writes data directly to the area of memory
that controls the screen, while the BIOS method uses calls to
your computer's BIOS (basic input/output system). The direct
output method is much faster, but only works on computers that
MVSP Ver. 2.1 -- Users Manual Page 11
are hardware-compatible with the IBM-PC (almost all IBM
compatibles sold these days are hardware-compatible). Direct
output also will cause problems when used under some windowing
environments such as older versions (ver. 2) of Microsoft's
Windows. If you are using one of these environments, you must
either run MVSP as a full-screen application or choose BIOS
output to allow MVSP to run in a window. Note that both Windows
3 and Quarterdeck's Desqview will run MVSP in a window without
choosing BIOS output, thus allowing faster screen output.
V - CHECK FOR VIDEO "SNOW". On some brands of colour graphics
adapter boards (most notably IBM's original), the fast method of
writing directly to the screen memory can cause interference, or
"snow", on the screen. This occurs when both the program and the
computer's graphics hardware try to work on the screen memory at
the same time. This option forces the program to check the
screen memory before writing to it to make sure there will be no
interference. This eliminates snow, but also slows down the
output somewhat. If your graphics adapter is not susceptible to
snow, then this option should be set to "No" for optimal speed.
If snow appears, then set the option to "Yes".
G - GRAPHICS OPTIONS allows you to change a number of defaults
related to the scattergrams produced by the ordination
procedures.
<Graphic placed here in printed manual>
P - SCATTERPLOT/DENDROGRAM TYPE lets you select either text or
graphics plots. Text plots are produced using regular characters
such as "-" and "|" and "*" that can be printed on any printer or
video screen. The placement of the points for scatterplots is
restricted to a grid of 70x22 or 110x55 characters, therefore the
accuracy of these graphs is limited. Text-based dendrograms
will be scaled to fit the width of the page and will extend as
long as necessary, even over multiple pages. Graphics plots are
produced by switching your video monitor to graphics mode and
drawing the graphs with lines and dots. These are more accurate
and aesthetic (see example below). However you must have a
graphics monitor to display these. MVSP supports CGA, EGA, VGA,
VESA Super VGA, Hercules, and AT&T or Compaq plasma display 400-
line graphics monitors. Except for the case of the 400-line mode
(see "400 LINE GRAPHICS MODE" below), MVSP will detect which type
of monitor is present and adjust accordingly. The appropriate
device driver file (CGA.BGI, EGAVGA.BGI, VESA.BGI, HERC.BGI, or
ATT.BGI) must be in the same directory as the program files
<Graphic placed here in printed manual>
Graphics scatterplots can be printed on dot matrix printers
either directly or through the DOS GRAPHICS screen-dump facility.
If you have a printer that is compatible with those listed under
"Printer Setup", plots can be output directly by choosing the
PRINT GRAPHICS AUTOMATICALLY option described below. For those
with other types of printers, check your DOS manual to see if
your printer is supported by the GRAPHICS command. If so,
running GRAPHICS before MVSP will allow you to print the graph
MVSP Ver. 2.1 -- Users Manual Page 12
using the Print Screen key. Note, though, that printing directly
from MVSP will give much higher resolution plots.
W - WIDE TEXT PLOTS are plots that are produced with a grid of
110x55 characters. If you have a wide carriage printer and paper
or a dot matrix printer capable of compressed mode printing (see
"PRINTER SETUP" below), then these graphs can be used, giving
higher resolution. Normally, a single wide text plot will fill a
whole page. However, I usually use a special print mode that is
a combination of compressed and superscript characters with a
line spacing of 12 lines per inch instead of the default 6
("tiny" print on the "TEXT STYLE" menu below). This produces
tiny but readable characters and allows two plots per page.
G - PLOTS PER PAGE allow you to specify how many plots to print
before issuing a new page command to the printer, thus ensuring
that plots aren't printed over the fold of the paper. In regular
text mode two plots fit per page but only one fits in wide text
mode (but see previous paragraph). In graphics mode you will be
able to fit one plot per page. If you are using the DOS GRAPHICS
utility to do a screen dump of the plot, then set this option to
one plot per page as well.
L - DATUM LABEL TYPES. By default, MVSP represents each plotted
point with a letter or other character. These symbols are also
listed on the printouts in a column headed "PLOT" so that you can
tell which case or variable is represented by each point. This
is the "Sequential" mode of data labelling. You may also choose
"Label" mode in which the first character of each datum label is
plotted. This is useful if you can assign the cases or variables
to distinct groups (such as environment type, sociological group,
or taxon) In these cases you use different letters or symbols as
the first character of each label in order to represent each
group. With these plotted, you can tell at a glance how well the
groups are distinguished by the analysis.
M - 400 LINE GRAPHICS MODE is a special mode used in AT&T 6300
and Compaq Portable III and 386 computers, among others. This is
similar to CGA high resolution mode but uses a resolution of
640x400 rather than 640x200. MVSP can usually tell what type of
display is being used, but these 400 line mode displays will be
detected as CGA monitors. To take advantage of the 400 line
mode, set this option to "Yes". The file ATT.BGI must be present
in the directory containing the MVSP program files.
A - PLOTS PER ANALYSIS allows you to specify how many axes to
plot for each analysis. If you know ahead of time that you want
to see the first three axes plotted against each other, set this
value to 3. You may wish to see the results before deciding how
many axes to plot. In this case, enter "-1" for the number of
plots; you will then be asked how many to plot as the procedure
is running.
E - PRINT GRAPHICS AUTOMATICALLY specifies that you wish to have
the graphics plots automatically printed rather than drawn on the
screen. Set this option to "Yes" to do this. If you instead
wish to examine the plot on the screen before deciding to print
MVSP Ver. 2.1 -- Users Manual Page 13
it, set this option to "No". When the plot is drawn on the
screen, the program will pause to allow you to look at it. If
you decide to print it, simply press the "P" key, otherwise press
any other key to go on. Also use the "No" option if you aren't
going to print the graphics mode plots, or if you are using the
DOS GRAPHICS screen-dump facility to print them.
T - PRINTER SETUP option - This option allows you to specify what
type of printer(s) you are using. Separate printers can be used
for the output of text results and graphics plots, so that you
could, for instance, have the results printed on a dot matrix
printer and the graphs on a plotter or high resolution laser
printer. There are several options on this menu:
<Graphic placed here in printed manual>
T - TEXT PRINTER - This option allows you to choose one of
several printers for the output of text results. This output
will include the numeric results as well as graphs if text mode
plots are chosen under the "Scatterplot/Dendrogram Type" option.
The "Plain ASCII" option will send text to the printer without
any control codes. The "Other" option allows you to specify the
printer codes in a similar manner as in MVSP version 2.0. To do
this you must first consult your printer manual to determine the
codes needed for the desired text effect. Then, using this
option, enter the decimal (not hexadecimal) codes with each
individual value preceeded by a slash (e.g. "\27\69" for bold
print on an Epson printer). You may enter the codes to set a
certain text effect and to reset the printer to its default
condition at the end.
Y - TEXT STYLE - MVSP can automatically set up your printer to
use different text styles for the printouts. Normal printing
gives output in your printer's default text mode. Compressed
will give text that can fit up to 130 columns on a single page of
A4 or 8.5"x11" paper. Tiny print is also compressed to allow for
130 columns but the text itself is also half as high as normal,
allowing for twice as many lines per page as well. If you choose
compressed or tiny print, make sure to set the "Page Width"
option (page 10) to 130 columns.
Z - PAPER SIZE - This allows you to choose the size of paper used
in your printer. You may specify letter, legal, or A4 size. If
you are using wide carriage paper, choose the size that matches
the length of your paper.
P - GRAPHICS PRINTER - You may choose from several types of
printers and plotters for your graphics output. You may also
save the graphs to a .PCX bitmap file at 640x480 resolution.
M - GRAPHICS PRINTER MODE - Each printer type has a number of
modes associated with it. These modes cover the resolution of
output and/or the page size. The available modes vary for each
printer type. Note that with some printers, most notably the HP
Laserjet, the highest resolution printouts can often take a long
time to complete.
MVSP Ver. 2.1 -- Users Manual Page 14
D - GRAPHICS OUTPUT DEVICE - You may specify to which parallel or
serial port your printer is attached. This allows you to have
two printers attached to one computer on different ports (the
text printer is always assumed to be on LPT1). You can also have
graphics output directed to a file. You can then send it to the
printer later using the DOS "COPY /B filename portname" command,
where "filename" is the name you specify for the output and
"portname" is LPT1, LPT2 COM1, or COM2.
You can also save graphics output to a file if you want to import
the plot into a graphics program for further editing or inclusion
in other documents. Many drawing, painting, and desktop
publishing/word processing programs allow you to import graphs in
a number of formats. The three graphics printer types in MVSP
that can be used for this purpose are HP Plotter, Postscript, and
bitmap. The Postscript files can be treated as Encapsulated
Postscript files (.EPS or .AI) by many programs.
W - PLOT WIDTH (CM) - This option allows you to specify the width
(in centimetres) of the graph on the page. The graph will be
centred on the page. If the value you specify is larger than the
page size it will be scaled to fill the page.
H - PLOT HEIGHT (CM) - This option, together with "Plot Width",
allows you to specify the size of the graph.
DATA EDITOR
Data files may be constructed using the MVSP data editor. This
editor is similar to a spreadsheet program. Data are entered
and presented in a tabular format, with the rows being the
variables and the columns being the individual cases or objects.
To use the data editor, first choose the "Manipulate Data" option
from the main menu and specify a filename. If that file exists,
it will be loaded into the editor for modification; if not, you
will be asked if you want to create a new file. You will now be
presented with the Data Manipulation menu. Choose "Enter/Edit
Data". If you are creating a new file, you will have the option
of reading in data from another file for modification and saving
under the new name.
You will next be asked to enter the maximum number of rows and
columns needed for the data matrix. MVSP must set aside a
certain amount of memory for working with the data matrix. If
you are editing an existing data matrix and don't plan to add new
rows or columns, then just accept the default values for rows and
columns. If you are adding rows or columns to either a new or
old matrix, then enter the maximum number needed. If you aren't
sure of the exact number, over-estimate. This will only cause
MVSP to set aside some extra memory while you are editing; it
will not have any lasting effect.
You will also be asked to enter or modify a title for the file.
This title identifies the data and will be printed out along with
the results of each analysis. You may enter up to 79 characters,
MVSP Ver. 2.1 -- Users Manual Page 15
so be as descriptive as you can. Note, however, that the
"Distances and Similarities" procedure uses the last few
characters of the title to place a label on the output symmetric
matrix file identifying the coefficient used. If you use most or
all of the 79 available characters, make sure no vital
information is at the end of the title, or it will be overwritten
by the identifier.
Entering data labels:
When creating a new file, you are first presented with a blank
spreadsheet with the cursor in the upper left corner. You must
first enter some labels for the rows and columns. You will
notice that the cursor will only move about in rows and columns
that have labels or in the next blank row or column. This is to
avoid having spurious values placed in areas that aren't meant
for data. By entering a row or column label, you are telling the
program that this is another variable or case to include in the
matrix. When you enter a new label, that row or column will then
be filled with zeros to indicate that it is now considered part
of the data matrix.
To enter labels, all you need to do is to start typing the label
while the cursor is in the desired row or column. When you start
typing, the bottom line will display the word "INPUT>" and the
characters you type will appear on this line. You may edit the
text using the backspace, cursor, insert, and delete keys, as
described in the section "Entering and editing text" above. Once
you are finished typing the label, you then place the label in
the matrix by typing one of the cursor keys (but not the Enter
key). This tells the editor whether the label is for a row or
column. Typing the up or down arrow cursor keys declare that
label to be for a row, while a left or right arrow key indicates
a column label. The cursor will also move in the appropriate
direction so that you are ready to enter another label.
The labels themselves can be up to ten characters long and may
consist of any printable character, except spaces. The following
are all valid labels:
ROW1
COLUMN_2
1st-Loc.
#3-Site
This label is NOT valid:
SITE 1
It will be read as two labels, "SITE" and "1". If you are using
labels that begin with a number (such as 1st-Loc.), you must
precede the label with a single or double quote (' or ") so that
the program will know that you are not attempting to enter
numeric data.
Entering data:
Once you have a few labels entered, you may start entering the
data themselves. This is done in a similar way to the labels;
MVSP Ver. 2.1 -- Users Manual Page 16
just start typing a number when the cursor is in the appropriate
place. Input of each number is finished by pressing one of the
cursor keys or the "Enter" key. If you enter any characters that
cannot be converted to numeric form, an error message will be
displayed and you may edit the input to correct the mistake. The
only valid characters for numeric data are '0'-'9', '-', '+',
'.', and 'E'. The 'E' is used for entering numbers in scientific
notation, so that 0.00001 (1.0 x 10-5) may also be entered as
"1.0E-05". Binary (presence/absence) data should be entered as
"0" and "1", with a "0" indicating absence.
Editing labels and data:
Editing of data and labels can be done in two ways. In either
case, the cursor must first be placed in the appropriate row and
column. Then you may either type the value anew, as you would
for entering data, or you may use one of the editing function
keys. The function of each of these keys is listed at the bottom
of the screen. Pressing F2 will bring the datum to the bottom
line of the screen, where it can be edited. F3 will allow you to
edit the row label and F4 the column label. These can be edited
and entered into the matrix as described above.
Saving data matrix:
Pressing the F9 key will save the data matrix to a file along
with all the changes you have made so far. I would suggest doing
this frequently to avoid losing any changes you have made due to
malfunction or mistakes in editing. The F10 key will save the
changes and exit back to the main menu. If you decide to abandon
the current editing session, press the ESC key. You will first
be asked to confirm that you want to exit, then you will be
returned to the main menu. All changes made since your last save
will be lost.
DATA FILE FORMAT
Data files from other sources can be imported to MVSP either
directly or with minor editing. If your data are in Lotus 1-2-3
or Symphony worksheets or in files for the Cornell Ecology
Programs (Decorana and Twinspan) then they can be imported
directly (see page 22). Otherwise the data can be transferred as
text files.
Most database and spreadsheet programs have an option for
outputting data to plain text (ASCII) files. A word processor or
text editor can then be used to modify the resulting files to the
appropriate format for MVSP (mainly by adding the file header
information, discussed below).
Data files for MVSP must be in ASCII format. This means that
they should consist only of letters or numbers, spaces, and most
other symbols represented on the keyboard. Many word processers
insert special formatting characters that will not be able to be
read by MVSP. You can check whether your word processor is one
of these by listing a word processed file to the screen with the
DOS TYPE command and looking for strange characters. If your
word processor uses these extra characters, make sure you modify
MVSP Ver. 2.1 -- Users Manual Page 17
your data files in a non-document mode that creates normal ASCII
files.
DATA FILE HEADER: The first line of the data file should be a
header line, which will give the program some information about
the data, such as the number of rows and columns. It should look
something like this:
* 10 15
This header line should begin with an asterisk ("*") in the first
column of the first line of the file. This asterisk tells the
program that a header is present. If the asterisk is not found,
the program assumes that the header information is not present,
and it will prompt the user for the information. MAKE SURE that
if this header information is present, there is an asterisk
before it; if not, the header information will be read as data!
The two numbers are the number of rows and columns in the data
matrix. The above example has 10 rows and 15 columns.
You may also include data labels in the data file. These labels
will be printed on your output to help make sense of the masses
of numbers that will be spewed out. If labels are included, this
must be specified in the file header. For example:
*L 10 15
specifies a data file that includes data labels and that has 10
rows and 15 columns (NOT including the labels themselves). The
"L" must come immediately after the "*", with no intervening
spaces, or it will be read as the number of rows, and an error
will occur. The numbers of rows and columns must be separated by
at least one space from each other.
DATA LABELS: The format of the data labels is explained above
under "Entering Data Labels". When data labels are included,
both row and column labels must be present. The column labels
should be in the second row of the data file, after the header
line, and the labels should be separated by at least one space.
The labels may be continued on to subsequent lines; the program
will continue reading column labels until it has read as many as
the number of columns you have specified in the header line.
Row labels occur on the same line as the data row to which they
apply, and should precede the first datum in that row, with a
space separating the label and datum.
DATA FILE TITLES: A title may also be added to your data file on
the header line, so that you know what these data represent.
Here's an example
*L 10 15 Test data file for MVSP
This title will be listed to the screen and placed on the output
when that file is selected. It must be separated from the other
elements of the header by at least one space, and it cannot be
more than 79 characters long. The Distances and Similarities
procedure will also place this title in the header of the matrix
MVSP Ver. 2.1 -- Users Manual Page 18
output file, along with the specification of which coefficient
was used, so that the title is carried over to the clustering
program.
DATA MATRIX: The data matrix itself should consist of the data
points separated by at least one space. The data for one row can
be continued on the next line. If the number of rows or columns
you specify is wrong, the data matrix will be read incorrectly,
often without warning. If you have a 10x10 matrix without labels
and specify 9 columns by mistake, the last datum on the first row
will be read as the first datum of the second row, and so on.
This, needless to say, can raise havoc with your results! All
procedures can print out the raw data so that you can check to
make sure it was read correctly. Here is a complete example
data file:
*L 5 10 Test data set for MVSP
COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8 COL9 COL10
ROW1 23 2 4 53 6 45 2 3 67 5
ROW2 10 2 4 34 1 4 3 10 20 3
ROW3 2 34 0 1 35 12 1 90 10 9
ROW4 98 12 10 4 10 9 10 5 20 31
ROW5 1 7 9 11 75 7 5 21 0 10
The input data files for the cluster analysis and PCO programs
use a slightly different header format. Here is an example:
*L 15 DIS Test data set for MVSP
Since the clustering and PCO programs use a symmetrical matrix as
input, it only needs one number for the size of the data matrix.
In this case the size of the matrix is 15x15. The third element
of the header is a three letter abbreviation specifying whether
the matrix is a similarity (SIM) or distance (DIS) matrix. This
code MUST be separated from the number of objects by only one
space, or it will not be read correctly. The "Distance and
Similarity" procedure of this program automatically sets up its
output files in this manner for input into these procedures.
Here is an example of a symmetrical input file, generated from an
analysis of the above matrix, using the Spearman Rank Order
Correlation Coefficient:
*L 10 SIM Test data set for MVSP - SPEARMAN
COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8 COL9 COL10
1.00
-0.15 1.00
0.36 -0.05 1.00
0.20 -0.97 0.05 1.00
-0.60 0.67 0.15 -0.60 1.00
0.30 0.21 -0.31 -0.00 0.10 1.00
0.30 -0.05 0.97 0.00 0.10 -0.50 1.00
-0.80 0.62 -0.41 -0.70 0.60 -0.30 -0.30 1.00
0.82 -0.55 -0.03 0.62 -0.82 0.41 -0.10 -0.87 1.00
0.10 0.67 0.67 -0.60 0.70 0.10 0.60 0.10 -0.41 1.00
Note that this is a lower half matrix, with diagonals (the
MVSP Ver. 2.1 -- Users Manual Page 19
1.00's) included. Other forms of matrices may also be specified
for input to the clustering program, as discussed below, but this
is the default output format of the similarities and distances
procedure.
DATA MANIPULATION
When you choose the "Manipulate Data" option from the main menu,
you are first asked to provide an input filename. You may then
transpose or transform those data, convert them to other scales,
drop rows or columns that are either selected by yourself or that
have totals of zero. Any combination of these options may be
chosen. When the Run command is chosen, a new data file will be
produced with all the changes you have selected. You will be
asked to provide a name for the new file; this must be different
from the input file.
<Graphic placed here in printed manual>
Transform data:
The Transform Data option allows you to choose to have the data
log or square root transformed before analysis. Most of these
procedures assume a normal distribution of the data, but this
assumption is often not met. Log transforming the data can
reduce the skewness of the data (Sokal & Rohlf, 1981), resulting
in a more interpretable analysis. In my work with fossil plant
data, I've found this to be invaluable, as I always have some
samples with extremely high abundances of certain taxa, and these
taxa tend to dominate the analysis due to their large numbers.
Log transforming the data evens this out. You are given the
option of what base of logarithm to use. Square root
transformation is also often used when the data are in the form
of counts. Please note that the log transformations are
performed on the values x+1, rather than x. This is done to
avoid computer errors when the data value is 0, since the log of
0 is undefined, and to avoid negative results when the value is
less than 1.
The logratio transformation (Aitchison, 1986) was designed
specifically for compositional (percentage or proportional) data.
These data are affected by closure, in which the increase of one
variable necessitates the relative decrease of another, even if
the absolute value of the other doesn't change. This can cause
many problems in statistical analyses. The logratio
transformation eliminates the closure problem by replacing the
proportions with the log of the ratio between the proportion and
the geometric mean of the sample. In mathematical terms, this
is:
x' = log(x / g )
i,j i,j i
where:
x = proportion of taxon j in the ith sample
i,j
MVSP Ver. 2.1 -- Users Manual Page 20
x' = transformed value
i,j
g = (x +...+x )1/n = geometric mean
i i,1 i,n
n = number of taxa in the sample
It should be emphasized that, for the logratio transformation to
be calculated properly, the samples MUST be the columns of the
data file. Otherwise the calculations will be meaningless.
Problems arise with the logratio transformation when some of the
proportions are zeros, since taking the log of zero produces an
error. This is remedied in MVSP by replacing them with a very
small value and then readjusting all other proportions so that
the total is 1.0. The replacement values are calculated using
Aitchison's (1986, p.269) zero replacement formula. This formula
incorporates a maximum rounding-off value that can affect the
final results. You can set this value when choosing the logratio
transformation. The new value can be saved to the configuration
file. You may want to try several runs with different values to
assess the effect.
Transpose data:
The "Transpose Data" option is another that is common to all
procedures. This allows you to transpose a matrix before
analysis, so that the rows of the matrix are treated as columns.
Convert data:
Convert Data allows you to change the scale of the data to
percentages, proportions, standardized scores, binary, the octave
class scale, or range-through type stratigraphic data. In
percentage and proportional data, the values are adjusted so that
the columns sum to 100 or 1.0 (respectively). The standardized
scores are adjusted by rows to zero mean and unit standard
deviation. Binary converts all non-zero values to 1. It is
sometimes useful to perform analyses on binary data to remove the
effects of abundance on the results.
The octave scale, which is often used in plant community ecology
(Gauch, 1982), is a ten point abundance class scale, roughly
based on log2. Percentage data are converted to the classes
based on the following scale:
0 = 0
>0 - 0.5% = 1
>0.5 - 1% = 2
>1 - 2% = 3
>2 - 4% = 4
>4 - 8% = 5
>8 - 16% = 6
>16 - 32% = 7
>32 - 64% = 8
>64 - 100% = 9
The scale was first developed as a convenience for visual
MVSP Ver. 2.1 -- Users Manual Page 21
estimation of abundances. It may also be used to convert fully
quantitative data to a simpler scale. Much of the minor
variation in abundances can be viewed as stochastic noise rather
than significant trends (Gauch, 1982). By breaking the data into
ten classes this minor variation is eliminated and only the major
'signal' is preserved. Arguably, the multivariate methods
provided by MVSP should also separate out these major trends,
leaving the noise in the background but very noisy data can
complicate this and, in ordinations, will cause the major trends
to account for only a small proportion of the total variance. In
comparisons of PCAs performed on raw, octave transformed, and
logratio transformed data, the octave scale performed equally as
well as the logratio, with very little difference in the results
on the first three axes (Kovach & Batten, in press).
The range-through conversion is provided as a convenience to
geological biostratigraphers. In analysing the stratigraphic
distribution of fossil organisms it is often desirable to treat
taxa as being present in all samples between the first and last
occurrence in a vertical sequence. This assumes that the absence
of a species in the middle of its range is due to ecological
differences or sampling bias rather than the actual absence of
the organism from the region at that time. In performing the
range-through conversion, it is assumed that the columns are
samples, that they are arranged in stratigraphic order, and that
the data values are abundance or presence-absence (with absence
indicated by a 0). Each row (taxon) is scanned for the first and
last occurrence of that taxon, then those and all samples in
between are converted to 1's, to indicate the presence of that
species. All other samples are left at 0.
Drop rows and columns:
There will often be times when you wish to analyse a subset of
your data. This option allows you to easily create new data
files that are subsets of another. When this option is set to
"Yes" and the procedure run, you will be presented with a list of
the row labels. You may move the cursor around the list and mark
the labels of the rows you wish to delete by pressing the space
bar. This will cause that label to be shown in reverse
highlighting (after you move the cursor), indicating that it will
be dropped. You may unmark the label by pressing the space bar
again. When all the labels to be dropped are marked, press the
carriage return. You will next see a list of the column labels,
which you can mark in the same way. Press the carriage return
again and a new data file will be created without those elements
that were marked.
Drop zero elements:
This option will scan through the data matrix looking for and
removing any rows or columns that have totals of zero. Often
when rows and columns are dropped using the previous option, some
cases or variables are left with only zero elements. This can
cause problems with some procedures; the CA procedure won't work
at all if there are any columns or rows with zero elements while
they can distort the results of other analyses. It is a good
idea to set this option as well when you are choosing rows and
columns to be deleted.
MVSP Ver. 2.1 -- Users Manual Page 22
IMPORT/EXPORT DATA
You can transfer data between MVSP and two other file formats.
This is done through the "Import/Export Data" menu option. With
this option, you are first presented with a menu allowing you to
specify which file format you want to use and whether to import
or export data. When you choose "Run", you are then asked for
the file name, and can request a directory listing of all files
of that format.
CEP Files:
These are files produced for the Cornell Ecology Program series,
including DECORANA, TWINSPAN, as well as related programs such as
Cajo T.F. ter Braak's CANOCO program (ter Braak, 1986). These
programs use a compressed data file format in which only non-zero
abundances are included. The data are presented in couplets,
with the first number indicating the taxon (variable) and the
second being the actual abundance. The couplets for each sample
are grouped on one or more lines, with the sample number being
specified at the beginning of each.
This import option works in a similar way, and replaces, the
separate utility REFORMAT that was distributed with earlier
version of MVSP.
Lotus 1-2-3/Symphony Files:
The spreadsheet files used in Lotus' programs 1-2-3 and Symphony
can be read and written by numerous other programs, so that this
format has become a common means of data exchange among IBM-PC
compatible software.
MVSP can read files produced for 1-2-3 versions 1 and 2, as well
as Symphony version 1 files (those with the extensions .WKS,
.WK1, and .WK2). The files it produces are .WKS files, for 1-2-3
version 1.
When reading Lotus files, MVSP assumes that the data are in a
matrix form, similar to the MVSP file format itself. First, it
will assume that you have a title for the data file at the top of
the spreadsheet grid, preferably in row 1. The next row will
contain the column labels, with each label occurring in the same
column as the associated data. Next the actual data will occur,
with each row of data (variables) on a single row of the
spreadsheet grid and each sample in a single column. The row
labels occur on the same row as their associated data and occur
before any of the data (preferably in column A).
MVSP will scan through the file first before actually importing
it to determine exactly where the data and labels are located, so
there is some scope for flexibility in the placement of the data.
However, if you follow the format specified above there is less
chance of failure in importing the data. After the data have
been imported you will want to check the resulting matrix for
columns or rows with the labels "ROWn" or "COLn", where n is a
number. These indicate that MVSP overestimated the extent of the
data matrix (usually due to stray cells in the spreadsheet).
These should be filled with zeros but if they actually contain
MVSP Ver. 2.1 -- Users Manual Page 23
data, then you will need to check the rest of the data and the
spreadsheet for inconsistencies.
Any formulae in the data matrix will be read as numbers, with the
current result of the formula being placed in the MVSP file.
Otherwise, the data are assumed to be numbers. If any non-
numeric data is found in the area where MVSP expects to find
numeric data, these will be replaced with a missing value marker
(-999999999) so that you can easily find them and replace them
with meaningful data. You will be warned if this has occurred
during import.
RUNNING NUMERICAL PROCEDURES
When one of the numerical procedure options (A-F on the main
menu) are chosen, you will be asked for the name of the input
data file. The program will automatically add your default
extension if none is specified. So, if your datafile is named
"STUDY1.MVS" and your default extension is .MVS, you need only
type "STUDY1". If you specify another extension, or have a
filename with no extension, the program will recognize those as
long as the full name is specified. Pressing the carriage return
while the line is blank will return you to the main menu.
You may obtain a directory of the default data disk and path by
typing a "?". You may then specify a certain file mask, such as
"*.MVS" for all files with a .MVS extension or "*.*" for all
files. You will then be presented with a list of all files
matching that specification. You can now move the cursor around
with the cursor keys until you find the one you want, then press
"Enter" to select that file. If there are more files than can
fit on one screen you can use the PageUp and PageDown keys to
move between screens. Pressing ESC will take you back to the
filename prompt.
After an input file has been selected, you will be presented with
an "Analytical Defaults" menu. This allows you to set a number
of options concerning the analysis about to be performed. The
following is the menu for the principal components analysis
procedure:
<Graphic placed here in printed manual>
The "Transform" and "Transpose Data" options work in a similar
way to those in the Data Manipulation procedure. All procedures
also allow you to access the "Change Program Defaults" menu,
described above, and to save the new defaults. "Quit" will
return you to the main menu, and "Run" will initiate the running
of the procedure.
The "Printed Output" option allows you to specify what ancillary
information is to be output as well as the destination of the
output. You may select to have the raw or transformed data
printed or other intermediary results such as the similarity
matrix in the eigenanalysis procedures. It is useful in initial
analyses to see the original data to ensure they have been read
MVSP Ver. 2.1 -- Users Manual Page 24
correctly, and it can be informative to peruse the intermediate
results. In the eigenanalysis procedures, you can also choose to
have the results graphed or to have the original data matrix
sorted by first axis scores and printed. This can be useful for
seeing patterns in the original data.
<Graphic placed here in printed manual>
The "Save to WKS File" option allows you to specify that the
resulting eigenvalues and scores from the ordination will be
separately saved to a Lotus-format file. This allows you to use
a spreadsheet or graphics program to produce plots of the scores
or to do further numerical analyses with the scores.
Note that all the results are stored in the file, with the
eigenvalues and percentages at the top, followed by the
components loadings or species scores, and ending with the
component scores or samples scores. If you wish to transfer just
one block of scores to your other programs you may need to edit
the spreadsheet file and delete the unneeded rows.
Alternatively, some programs that import Lotus files allow you to
specify in what row and column the results start.
The "Output Destination" menu allows you to choose whether to
send output to the printer or a file and whether to also show the
results on the screen. I often find it useful to send the
results to a file so that they can be input to other programs.
For instance, if you have a publication quality graphics program
available, you can edit the output file, deleting the extra text
so that only the loadings and scores are left, and then import
these coordinates into the graphics program for plotting, thus
saving you from having to retype them. You may first get a hard
copy of the results by sending the file to the printer with the
DOS command
COPY filename PRN
or
PRINT filename
If you have specified that the output should be sent to a file,
you will be prompted for the name of the output file when you run
the analysis. If you enter a blank carriage return, this output
file will default to the input file name plus the default output
file extension you have specified. The output file for an
analysis of STUDY1.MVS will default to STUDY1.OUT if your default
output extension is *.OUT.
Principal components analysis:
This procedure performs a R-mode principal components analysis.
The component loadings are scaled to unity, so that the sum of
squares of an eigenvector equals 1, and the component scores are
scaled so that the sum of squares equals the eigenvalue. Q-mode
PCA will generally have the opposite scaling. Note that many
packages, such as SPSS and SYSTAT, perform Q-mode PCA, and thus
their eigenvectors will be scaled to the eigenvalue, rather than
unity. For details on the computation and assumptions of the
technique, see Orloci (1978), Gauch (1982), Pielou (1984), Manley
MVSP Ver. 2.1 -- Users Manual Page 25
(1986), and Jolliffe (1986). Orloci and Jolliffe give detailed
mathematical discussion of PCA, while Gauch, Pielou and Manley
give very clear and understandable discussions of the basis of
the technique and its use and assumptions.
In the R-mode analysis, similarity coefficients are calculated
for the descriptors (or variables), which are the rows of the
matrix and component scores are calculated for the objects (or
cases), which are the columns of the matrix.
STANDARDIZATION AND CENTRING - The Analytical Defaults menu has
two options that affect how the PCA is calculated. You may
choose to standardize the similarity matrix before eigenanalysis
(thus creating a correlation rather than a covariance matrix),
and you may use either a centred or uncentred data matrix.
Generally a centred covariance matrix is used, but if different
units of measurement are used in the data matrix, these will need
to be standardized, and thus a correlation matrix should be used.
Standardization may also be desired in ecological studies to
reduce the effects of dominant species, so that rarer species
play a greater role in the resulting configuration. An uncentred
data matrix is called for when there is appreciable between-axes
heterogeneity. This means that different clusters of points are
associated with different axes, and have little projection on
other axes. This often occurs when different groups of samples
have completely different sets of common species, with little
overlap. See Noy-Meir (1973) and Pielou (1984) for more on this
phenomenon.
MINIMUM EIGENVALUE - You may also specify the minimum eigenvalue
for which components are printed out. The possible options are
to have all components printed, only those above a certain
eigenvalue that you supply, or to base the minimum eigenvalue on
one of two rules. Kaiser's rule states that the minimum
eigenvalue should be the average of all eigenvalues (or 1 if the
correlation matrix is used). This is often considered a good
rule of thumb for determining whether a component is
interpretable (Legendre & Legendre, 1983). Jolliffe (1986)
proposed a modification of this rule in which the minimum
eigenvalue is 0.7 times the average eigenvalue. This will
usually give one or more extra components over Kaiser's rule.
ACCURACY - The accuracy and speed of the eigenanalysis can be
controlled by using the "Accuracy of Solution" option.
Eigenanalysis in MVSP is performed using the cyclic Jacobi
method, which is an iterative procedure that makes repeated
passes through the matrix improving the accuracy of the solution.
The iterations stop when a certain level of accuracy, which is
supplied by the user, is reached. Greater accuracy in the
solution means that more passes must be made through the matrix,
therefore the program takes longer to run. The accuracy level
that you supply to MVSP usually turns out to be roughly equal to
the number of correct significant digits in the loadings and
scores of the most important components (those greater than 10%
of the total variance), so that a level of 1.0 x 10-6 means that
these should have roughly six significant digits. You can
experiment with different levels to determine the trade-offs
MVSP Ver. 2.1 -- Users Manual Page 26
between speed and accuracy.
RUNNING THE ANALYSIS - Choosing "Run" will initiate the analysis.
Status messages will be listed to the screen during the analysis
to let you know how things are proceeding. When it is done, the
eigenvalues and their percentage of the total variation will be
printed along with the component coefficients (or eigenvectors),
then the component scores for each principal component will be
calculated and printed.
If you have chosen to have the results graphed and have provided
a set number of axes to plot through the Graphics Options menu,
then these plots will be produced automatically. If the "Plots
Per Analysis" option on the Graphics Options menu has been set to
"Ask", you will first be prompted to enter the number. Entering
a zero will bypass the plotting procedure. See the Graphics
Option section above for more details about plotting in MVSP.
Principal coordinates analysis:
Principal coordinates analysis (PCO) is a generalized form of
PCA. Whereas PCA implicitly uses either a covariance or
correlation matrix, PCO allows you to input any matrix of metric
values. PCO may be used with any of the distances calculated by
MVSP except for the squared Euclidean distance. Of the
similarity measures only Gower's is metric. PCO is calculated as
a Q-mode eigenanalysis, therefore it only gives the eigenvectors,
not scores. Note that a PCO of Euclidean distances will give the
same results as a Q-mode PCA.
Many of the options available for PCA are not applicable to PCO.
There is one new option:
MATRIX INPUT - A matrix of distance measures must first be
calculated using the "Distances and Similarities" procedure (see
below). This matrix is then read by the PCO procedure and the
eigenvalues and eigenvectors are calculated. A number of
different input formats are available, including various forms of
half matrices and full matrices. This defaults to the same form
specified in the "Matrix Output" option of the "Distances and
Similarities" procedure.
Correspondence analysis:
The correspondence analysis (or reciprocal averaging) procedure
performs several varieties of correspondence analysis (Pielou,
1984; see also Hill, 1973, Gauch, 1982, Greenacre, 1984),
including detrended correspondence analysis (DCA; Hill & Gauch,
1980). Correspondence analysis in general is well suited for
working with count or presence/absence data, whereas PCA is
geared more towards measurement data on a continuous scale
(although PCA can also be performed on count and binary data;
Jolliffe, 1986).
DCA was developed by Hill and Gauch (1980) in order to correct
two flaws in most ordination techniques. The "arch effect" or
"horseshoe effect" is a common feature of most ordinations. This
is manifested by the points on the ordination plot being arranged
along an arch on the first two axes, rather than a straight line
MVSP Ver. 2.1 -- Users Manual Page 27
as expected if the first axis represents a gradient. This is a
artifact of the data reduction process that occurs in ordination
and represents a mathematical relationship between the first two
axes, which are supposed to be independent. As a result of this
arch, the second flaw occurs in which the points at either end of
the first axis are closer together than those in the middle.
These flaws also occur in subsequent axes.
DCA was designed to remove this arch from the ordination diagram.
It does this by dividing the first axis into a number of
segments, then adjusting the scores of the points on the second
axis so that the mean score within each segment is the same.
Thus it is like cutting the plot into a number of vertical strips
and moving each up and down until the points are in a straight
line. The scores are also adjusted along the first axis so that
they are more evenly spread.
This method can often give more interpretable results, but it can
also introduce distortion of its own. It is always a good idea
to try both regular and detrended correspondence analysis on a
data set and compare the results.
Detrended correspondence analysis assumes that the actual data
being analysed are abundances of a set of variables (taxa in an
ecological study) in a set of samples. Presence/absence data may
also be used (entered as 0 and 1), but the none of the data may
be negative. It is also assumed that the samples come from a
gradient in which different variables (taxa) characterize
different parts of the gradient. Although it is most commonly
used in ecology, this method may also be used in other fields
where these assumptions hold, such as archaeology or market
research.
Many of the options in CA/DCA are similar to those in the PCA
procedure. There are several new ones:
ALGORITHM - MVSP normally uses the cyclic Jacobi method of
calculating ordinations. This method calculates the scores for
all axes simultaneously. However, the detrending process cannot
be performed with this algorithm, since each axis must be
detrended against the final scores of the previous axis. Thus an
alternative algorithm can be used in which the solution for each
axis is calculated separately. This is done using the reciprocal
averaging method described by Hill (1973). The two algorithms
are referred to as "Cyclic Jacobi" and "Reciprocal Averaging"
respectively.
Reciprocal averaging must be used if detrending is desired. You
may also want to use the algorithm for non-detrended analyses as
well. The algorithm only extracts the first four axes and is
usually much faster than the eigenanalysis by the cyclic Jacobi
algorithm, which must extract all axes. This is most pronounced
with large data sets. However, you often need to see more than
the first four axes, particularly if the first four do not
account for much of the total variability in the data set. Also,
in cases where two or more of the axes have similar eigenvalues
the reciprocal averaging method may not give accurate results.
MVSP Ver. 2.1 -- Users Manual Page 28
If this happens a warning message will be displayed.
The actual scores produced using the two algorithms will differ,
because the scaling is different, but the actual configuration on
a plot will be the same. The scores produced by the reciprocal
averaging method will be scaled to the standard deviation of the
species abundance along the gradient represented by the axis. If
we assume species abundance along a gradient is normally
distributed, then a species will appear, rise to its highest
abundance, and disappear in about 4 standard deviation units
(sd). Thus if the ordination axis is relatively short (less than
3-4 sd units) then the species turnover along the gradient will
be low, whereas long axes (say 12 sd units) will probably have
completely different sets of species at either end. Following
Hills' original DECORANA program, the sd units are multiplied by
100, so a distance of 400 along the axis represents 4 sd units.
The scaling of the axes produced by the eigenanalysis algorithm
will be related to the original species abundances, unless the
option is chosen to scale them to percentages.
DETRENDING - This option invokes the detrending procedure. It
can only be used with the reciprocal averaging algorithm and the
setting of the algorithm option will be changed when this option
is chosen.
WEIGHTING - When using the Jacobi algorithm, the analysis can be
run with a weighting of either the rare or the common species.
See Orloci (1978, pp. 152-168) for details of these methods of
weighting. Also, the scores can be adjusted to percentages. The
data file should have species as the rows and samples as the
columns, as in the PCA procedure.
DOWNWEIGHT RARE SPECIES - MVSP follows Hill's DECORANA program in
allowing the rare species to be downweighted before the analysis.
This is only available when the reciprocal averaging algorithm is
used. It can be useful if you want most weight to be given to
the common species, but you still want to see how the rarer taxa
are affected. Those taxa that occur in fewer than 1/5 the number
of samples that the most common taxon occurs in will be
downweighted. The amount that the species is downweighted is
related to its frequency of occurrence.
SEGMENTS FOR DETRENDING - This option sets the number of segments
the axis should be divided into for the detrending process. The
default value, 26, should be adequate for most analyses, but if
the detrending does not seem to be as effective as it could be a
larger number can be tried.
RESCALING CYCLES - When detrending is in force, the axes can also
be rescaled so that the points at the end are not closer together
than those in the middle. This rescaling is done several times
and this option allows you to vary the number of times. It is
generally not advisable to change this from the default of 4,
however, as further rescaling may reduce the effectiveness of the
ordination. Rescaling may be bypassed by entering 0 for this
option.
MVSP Ver. 2.1 -- Users Manual Page 29
Distances and similarities:
This procedure calculates a variety of distance and similarity
measures. The distances are calculated between the columns of
the data matrix. An option to transpose the data matrix is
included, to allow analysis of the rows without requiring re-
entry of the data. There are numerous publications that discuss
different type of measures. I have relied on the following in
implementing the formulae used in this procedure: Prentice
(1980), Sneath & Sokal (1973), Pielou (1984), Greig-Smith (1983),
Gordon (1981), and Everitt (1980). You may refer to these for
details about the measures provided in MVSP.
MATRIX OUTPUT - This procedure is set up to allow easy input of
the resulting symmetric matrices into the cluster analysis and
PCO procedures. If you choose to input the distance matrix into
these, a copy of it, along with the appropriate header
information, will be put into a file. This matrix file can then
be used as input to the other analyses. When the procedure is
run, another filename must be specified for this matrix file.
This filename defaults to the symmetric matrix default extension.
You may use the matrix output option to specify the type of
matrix (e.g. upper or lower half matrix, diagonal present or
absent).
COEFFICIENT - There are presently eighteen measures available.
These, and their formulae, are listed below. In these formulae,
i and j represent two columns of the data matrix, k represents
the rows, and therefore X would be the datum in the kth row of
ik
column i. Following the name of each measure is the marker
placed in the output file created by the "Distances and
Similarities" procedure (see section on "Data file format").
This marker identifies the coefficient that was used to calculate
the matrix. It is checked by the cluster analysis procedure when
the minimum variance strategy is used. Minimum variance
clustering can only be performed on squared Euclidean distances,
so this marker allows the program to ensure that the correct
distance is being used.
Euclidean distance (EUCLID):
2 ½
Ed = (S (X - X ) )
ij k ik jk
Squared Euclidean distance (SEUCLID):
2
SEd = S (X - X )
ij k ik jk
Standardized Euclidean distance (STEUCLID):
2 ½
StEd = (S (X - X /sd ) )
ij k ik jk k
MVSP Ver. 2.1 -- Users Manual Page 30
where: sd = standard deviation of all the elements of k
k
Cosine theta (or normalized Euclidean) distance (COSINE):
2 ½
CTd = (Sk((X /ss ) - (X /ss )) )
ij k ik i jk j
2 ½
where: ss = (S (X ) )
x x xk
Manhattan metric distance (MANHAT):
MMd = S |X - X |
ij k ik jk
Canberra metric distance (CANBER):
CMd = S (|X - X | / (X + X ))
ij k ik jk ik jk
Chord distance (CHORD):
½ ½ 2 ½
Cd = (S (X - X ) )
ij k ik jk
Chi-square distance (formula X2 of Prentice, 1980) (CHISQR):
2 ½
CSd = (S ((X - X ) /S X ))
ij k ik jk l lk
Average distance (AVERAGE):
2 ½
Ad = ((S (X - X )) /n)
ij k ik jk
where: n = number of elements in each variable (i or j)
Mean character difference distance (MEANCHAR):
½
MCDd = ((S |X - X |)/n)
ij k ik jk
where: n = number of elements in each variable (i or j)
MVSP Ver. 2.1 -- Users Manual Page 31
Pearson product moment correlation coefficient (PEARS):
_ _
S (X - X ) (X - X )
k ik i jk j
PCC = ----------------------------------------
ij _ 2 ½ _ 2 ½
(S (X - X ) ) (S (X - X ) )
k ik i k jk j
Spearman rank order correlation coefficient (SPEAR):
2
6 S (R - R )
k ik jk
SCC = 1 - -----------------------
ij 3
n - n
where: R = rank order of element in variable
Percent similarity coefficient (PERCENT):
S min(X , X )
k ik jk
PSc = 200 --------------------
ij S (X + X )
k ik jk
where: min = minimum of two values
Gower general similarity coefficient (GOWER):
S (w s )
k ijk ijk
GGSc = ------------------
ij Skw
ijk
|x - x |
ik jk
where: si = 1 - ------------- for quantitative data,
jk range(k)
= 1 for matches of binary or multistate data,
= 0 for all mismatches
w = 0 for negative matches of binary data,
ijk = 1 in all other situations
For this coefficient, the data type for each variable (row) must
be declared. This is done through the first two characters of
the data labels: those beginning with "B_" are taken to be
binary, those with "M_" multistate, anything else is considered
quantitative. For instance a variable indicating the presence or
absence of sepals in a flower would have the label B_SEPAL, that
indicating the colour of the petals (one of four possible) would
be named M_COLOUR, and petal length would be recorded in the row
MVSP Ver. 2.1 -- Users Manual Page 32
with the label LENGTH.
The following binary (presence/absence) coefficients are based on
a table of frequency of matches and mis-matches of the presence
or absence of a single variable. The binary data should be
entered into the data matrix as 0 (zero) and 1 (one). Any number
that is not zero is also treated as a one, indicating presence.
Sample j
Presence Absence
┌───────────────────────┐
Sample i Presence │ a b │
│ │
Absence │ c d │
└───────────────────────┘
Sorensen's coefficient (SOREN):
Sc = 2a / (2a + b + c)
ij
Jaccard's coefficient (JACCA):
Jc = a / (a + b + c)
ij
Simple matching coefficient (MATCH):
SMc = (a + d) / (a + b + c + d)
ij
Yule coefficient (YULE):
Yc = (ad - bc) / (ad + bc)
ij
Cluster analysis:
This procedure performs hierarchical agglomerative cluster
analysis of an input matrix of distance or similarity measures.
Seven forms of clustering are presently available: the four
average linkage procedures (unweighted pair group, unweighted
centroid, weighted pair group, and weighted centroid [or
median]); nearest and farthest linkage, and minimum variance.
The actual algorithm is based on Lance & William's (1966)
generalized clustering procedure. For clear and concise
explanations of the theory and practice behind cluster analysis,
see Sneath and Sokal (1973), Everritt (1980), Grieg-Smith (1983),
and Pielou (1984).
MATRIX INPUT - A number of different input formats are available,
including various forms of half matrices and full matrices. This
defaults to the same form specified in the Matrix Output option
of the Distances and Similarities procedure.
TREE DESCRIPTION FILE - When the clustering is finished, you can
have a description of the resulting dendrogram output to a file.
MVSP Ver. 2.1 -- Users Manual Page 33
This description is in the form of labels enclosed in parentheses
and commas, which delimit the clusters. Also after each label
and closing bracket is the distance between that object or group
and the next in the hierarchy. An example of this description
is:
((LENGTH:125.71,WIDTH:125.71):170.50,HEIGHT:296.21);
This would correspond to a dendrogram of the form:
<Graphic placed here in printed manual>
Christopher Meacham has written a program called PLOTGRAM which
can be used to plot dendrograms and cladograms described in the
above format. For PLOTGRAM to properly read the description
produced by MVSP, the following two options must be set:
DIAGRAMTYPE Y
TIPS Y
Plotgram is no longer included with MVSP, since MVSP can now
automatically plot dendrograms (see below). If you wish to get a
copy of Plotgram that will work with MVSP-generated files, one
may be obtained, along with the Pascal source code, at cost from
Kovach Computing Services. We cannot, however, provide support
for the program or endeavour to add new types of printers.
TREE ORDER FILE - MVSP can also produce a file in which the data
labels are listed in the order they occupy in the dendrogram.
This type of file can be read by the program SORTDATA, which
accompanies the registered version of MVSP. This program is
useful for producing combination dendrograms in which two
dendrograms, one for the columns of the data matrix and another
for the rows, are plotted together with the original data matrix
in between in graphic form (see example below; also Kovach,
1988a,b; 1989 and Duigan & Kovach, 1991). This allows you to see
how the data are affecting the clustering. See the Utilities
section below for details about SORTDATA.
<Graphic placed here in printed manual>
RANDOMIZE INPUT ORDER - There have recently been some suggestions
(Bayer, 1985; Lespérance, 1990) that input order of the data
matrix can affect the results of clustering with certain types of
data sets. Changing the input order can not only change the
order of objects in the dendrogram but more importantly can also
cause some objects to be joined to different clusters. This is
particularly possible when two or more pairs of objects have
identical similarities either at the beginning or after
recalculation during the clustering procedure.
Normally the clustering procedure scans through the similarity
matrix sequentially looking for the next pair of objects to fuse.
Choosing the "Randomize" option causes the matrix to be scanned
in a random order which changes each time the procedure is run.
In order to check for chaotic behaviour in clustering, try
running two or three clusterings of the same data matrix with
MVSP Ver. 2.1 -- Users Manual Page 34
this option set, then compare the dendrograms. Note that changes
in the actual order of objects in the dendrogram are to be
expected; a cluster diagram can be viewed as a 'mobile' hanging
from a ceiling in which the different clusters can rotate around.
It is the branching order in the dendrogram that is important and
this is what should be compared when testing for chaotic
behaviour.
CONSTRAINED CLUSTERING - As stated above, normally the actual
order of objects in the dendrogram is not important. However, if
you are working with sequential data (such as in stratigraphic
geological studies), a special constrained form of cluster
analysis can be used (Birks & Gordon, 1985; Kovach, in press).
When this option is chosen, clustering proceeds as usual except
that the objects to be fused are constrained to be adjacent in
the data matrix. Therefore, the dendrogram that is produced will
have the objects in the same order as the input matrix.
This type of constraint can often cause distortion in the
dendrogram. In particular, reversals often occur where the
distance (and therefore the branching level) between two objects
is greater than that between the cluster of those two and the
next object in the hierarchy. In sequences where there is a lot
of variability, this can cause the dendrogram to be almost
uninterpretable.
OUTPUT - The output of the procedure consists of a report of the
status of the clustering procedure as each new object is added to
the cluster. The average similarity or distance of the two
groups that have just been joined is printed out, along with a
listing of the two groups and the number of objects in the newly
fused group. If a single object is added to another cluster, the
label for that object (or a numerical label corresponding to its
position in the data matrix) is printed out. If a whole group is
added, the node at which that group was last added to is printed
out. For instance, a report such as:
NUMBER OF OBJECTS
NODE GROUP 1 GROUP 2 DISSIMILARITY IN FUSED GROUP
1 LENGTH WIDTH 125.706 2
2 NODE 1 HEIGHT 296.206 3
would correspond to the dendrogram shown previously.
The results of the cluster analyses are also automatically
displayed as dendrograms. These may either be text-based or
drawn in graphics mode, depending on the setting of the
"Scatterplot/Dendrogram Type" option of the "Graphics Output"
menu (under the "Program Defaults" menu). The text-based
dendrograms will automatically be directed to the same file or
printer that the results are going to. Graphics dendrograms may
be printed by pressing "P" when the dendrogram is on the screen.
See the "Printer Setup" options on page 13 for more information.
Diversity indices:
This procedure computes three diversity indices commonly used in
ecology, Simpson's, Shannon's, and Brillouin's. See Pielou
MVSP Ver. 2.1 -- Users Manual Page 35
(1969) for a discussion of the use and derivation of these
indices.
The input data file should be set up with species as rows and
samples as columns. The diversity, then, is calculated for each
column. Be forewarned that the Brillouin index calculates
factorials of the species abundances, and if any of your
abundances are high, this could take a very long time! For
abundances greater than 1000 the factorial is estimated using
Stirling's formula, which is much faster and, at these
abundances, provides a close approximation.
The LOG BASE option allows you to specify whether to use
logarithms to the base 10, 2, or e. The output consists not only
of the diversity index, but also the number of species and the
evenness, which is defined as the diversity divided by the log of
the number of species.
UTILITIES
Sortdata:
In many of my analyses I perform clusterings of both the samples
and species. I've found it very valuable to present the
resulting two diagrams with the original data matrix in between,
sorted in the order of the dendrograms. The data can be split
into abundance classes, which are represented by different
characters, so that the differing abundances can be seen at a
glance. In this way the structure revealed by the cluster
analyses can be seen directly in the data matrix (see Kovach,
1988a,b; 1989 for some examples). SORTDATA is a utility I've
written to help produce these diagrams. It is only included with
the registered version of MVSP.
To produce one of these combination diagrams, you must first run
two cluster analyses of the same data matrix, one with the matrix
transposed, the other not. Make sure that the "Tree Order"
option is turned on. This will produce two files with the order
of the objects in the dendrogram for SORTDATA. Next run SORTDATA
with the following parameters:
SORTDATA datafile.MVS order1.ORD order2.ORD [output.SRT]
where "datafile.MVS" is your original data file used for input to
the distance and similarity procedure, "order1.ORD" and
"order2.ORD" are the tree order files for analyses of the
transposed and non-transposed matrices, and "output.SRT" is the
file that will contain the sorted data matrix. If "output.SRT"
is missing the output will be put into a file named "datafile"
with a .SRT extension.
When the program is run, it will first read the original data
matrix, determine the lowest and highest data values, and then
ask you to define the ranges of four data classes. First you
must enter the value below which no symbol is plotted; if your
data are counts, this value will be 1. Then enter the cutoff
points between the four classes. When you are done the program
MVSP Ver. 2.1 -- Users Manual Page 36
will sort the data and translate them to the abundance classes,
placing the results in a file.
To assemble the resulting diagram, you must first print out the
sorted data matrix. You can use your word processor to print it,
perhaps adjusting the character font (pica, elite, or condensed)
and the line spacing to fit the diagram on one page. Then
measure the width and length of the matrix and use MVSP to
produce the dendrograms, setting the "Plot Height" and "Plot
Width" parameters on the "Printer Setup" menu to the appropriate
length (in cm) so that the dendrogram will be the same size as
the sorted data matrix. Alternatively you can reduce or enlarge
dendrograms you've already plotted with a photocopier or
photographically. The whole diagram may then be assembled on a
large (A3 or 11"x17") piece of paper.
To obtain the registered version of MVSP, with the SORTDATA
utility, see page 4, the REGISTER.DOC file, or the "Register"
option on the main menu.
DISCLAIMER
The accuracy of this program has of course been extensively
tested against the results of other programs. However,
unforeseen errors in computation can and have crept up even in
the most sophisticated and widely used statistical packages. You
may wish to initially run comparisons with the results of other
programs, using your own data set, to ensure that it is working
properly with your type of data.
Note when running comparisons that there are often many methods
of computing the same routine, and results may vary, especially
in the more complex eigenanalysis procedures. In principal
components analysis, for instance, there are numerous ways of
transforming the data before eigenanalysis (see Greig-Smith,
1983, pp. 247ff), and the component loadings can be scaled either
to unity (as they are here) or to the variance of that principal
component, or in other manners. Also, the eigenanalysis can
rotate the cloud of points in different directions, so that signs
of the scores are reversed and the actual values different. The
configuration of the points will be the same, however.
If you do run into any problems with this program, whether they
be in the results or abnormalities in the running of the program,
please contact me by post or through electronic mail at the
addresses given on the cover page. Please give full details of
the problem and, if possible, the data set which you were running
when the bug cropped up.
Please note that no warranty is given for this program. The
author (Warren L. Kovach) shall not be legally liable for any
damages or lost profits arising from use or misuse of this
program. Refer to the "Limited Warranty" section on page 5 for
full details.
MVSP Ver. 2.1 -- Users Manual Page 37
80x87 SUPPORT
If you aren't satisfied with the speed of this program, a faster
version that uses the 80x87 math coprocessor is distributed with
MVSP Plus, the registered version of MVSP. This coprocessor
(which is an optional chip that can be plugged into your
computer) greatly speeds up the processing of real number,
floating point arithmetic. Often this increase in speed can
amount to 10 times! This is particularly noticeable for
calculation that use logarithms and trigonometric calculations.
The calculation of the Brillouin diversity index, which uses log
factorials, for an 84x84 matrix took 9 minutes 14 seconds without
a math chip but 2 minutes 41 seconds with one. A PCA, which uses
mostly arithmetic operations, of a 45x45 data matrix took one
hour with the standard version of the program, but only twenty
minutes with the 80x87 version (tests run on a 12 MHz 80286 based
Compaq Portable III).
Borland Pascal, the compiler used in developing MVSP, has an
option for creating programs that take advantage of this
processor. The programs compiled using this option will only
work on machines that have the 80x87 installed. The installation
program will detect whether your computer has a math chip and
will install the appropriate version.
To obtain the registered version of MVSP, with 80x87 support, see
page 4, the REGISTER.DOC file, or the "Register" option on the
main menu.
PROTECTED MODE VERSION
New with MVSP ver. 2.1 is a special protected mode version. It
is only provided with MVSP Plus, the registered version of MVSP.
This is compiled to run in what is called "protected mode" on the
Intel 80286, 80386, 80486, and Pentium microprocessors. The
primary advantage in using protected mode, as opposed to the
"real mode" which is compatible with the older 8088 and 8086
chips, is that all of the RAM memory in the machine can be
directly accessed, whereas real mode programs are limited to
640Kb of RAM memory. So, if you have 16 Mb of RAM in your
computer MVSP in protected mode can directly use all of it for
storing data while calculating. Under real mode, MVSP would have
to dump parts of large arrays onto disk while calculating, which
slows down the process tremendously. For example, a cluster
analysis of a 400x400 matrix, which needs to store 937kb of data
on disk under real mode, took 191 minutes whereas the same
analysis under protected mode took only 41 minutes. (test run on
a 16MHz 386SX based machine with 5Mb RAM).
MVSP Protected Mode requires a computer with a 80286 or higher
microprocessor and at least 2 Mb of RAM. If your computer has
these capabilities, then the MVSP installation program will give
you the option of installing the protected mode version as well.
The protected mode version of MVSP was produced using Borland
International's Borland Pascal 7.0 compiler. The program runs
MVSP Ver. 2.1 -- Users Manual Page 38
under the DPMI (DOS Protected Mode Interface) standard. This
requires that a DPMI compatible server is installed. Some 386
memory managers (such as QEMM 386) provide optional DPMI support,
as does Microsoft Windows 3.x in enhanced mode. If you do not
have a DPMI server installed, then Borland's server will
automatically be loaded and used. You need do nothing except
type the command MVSPPROT at the DOS prompt.
For MVSP Protected Mode to work properly, two files must be
present in the same directory as the MVSPPROT.EXE program. These
are DPMI16BI.OVL and RTM.EXE. These are files produced by
Borland International. They will be automatically installed
along with MVSP Protected Mode. Accompanying them is a
documentation file called DPMIUSER.DOC that explains more about
running protected mode programs, as well as discussing some other
utility programs that are included with MVSP Protected Mode. If
you have any problems running the protected mode version refer to
this documentation file first.
To obtain the registered version of MVSP, with the protected mode
version, see page 4, the REGISTER.DOC file, or the "Register"
option on the main menu.
APPENDICES
The printed manual with the registered version of MVSP has
several large appendices at this point, describing the example
data files, listing error messages and their meanings, explaining
the format of the configuration file, and giving information on
efficient memory management. These have been omitted from the
shareware version for brevity.
MVSP Ver. 2.1 -- Users Manual Page 39
REFERENCES
Aitchison, J., 1986. The Statistical Analysis of Compositional
Data. Chapman and Hall, London.
Bayer, U., 1985. Lecture notes in earth sciences. 2 Pattern
recognition problems in geology and palaeontology. Springer-
Verlag.
Birks, H.J.B. & Gordon, A.D., 1985. Numerical Methods in
Quaternary Pollen Analysis. Academic Press, London.
Cooke, D., Craven, A.H., & Clarke, G.M., 1982. Basic Statistical
Computing. Edward Arnold (Publishers) Ltd., London.
Davis, J.C., 1986. Statistics and Data Analysis in Geology, 2nd
Edition. John Wiley & Sons, New York.
Duigan, C. A. & Kovach, W.L., 1991. A study of the distribution
and ecology of littoral freshwater chydorid (Crustacea,
Cladocera) communities in Ireland using multivariate analyses.
Journal of Biogeography, 18:267-280.
Everitt, B., 1980. Cluster Analysis. 2nd Edition. Gower
Publishing Co., Hampshire, 136 pp.
Gauch, H.G. Jr., 1982. Multivariate Analysis in Community
Ecology. Cambridge University Press, New York.
Gordon, A.D., 1981. Classification. Chapman and Hall, London.
Greenacre, M.J., 1984. Theory and applications of correspondence
analysis. Academic Press, London.
Greig-Smith, P., 1983. Quantitative Plant Ecology. University
of California Press, Berkeley.
Hill, M.O., 1973. Reciprocal averaging: An eigenvector method of
ordination. Journal of Ecology, 61:237-249.
Hill, M.O., & Gauch, H.G. Jr., 1980. Detrended correspondence
analysis: An improved ordination technique. Vegetatio, 42:47-58.
Jolicoeur, P., & Mosimann, J.E., 1960. Size and shape variation
in the Painted Turtle. A principal component analysis. Growth,
24:339-354.
Jolliffe, I.T., 1986. Principal Components Analysis. Springer-
Verlag, New York.
Kent, M., & Coker, P., 1992. Vegetation description and
analysis. A practical approach. Belhaven Press, London.
Kovach, W. L., 1988a. Multivariate methods of analyzing
paleoecological data. In: W. A. DiMichele & S. L. Wing (eds.),
Methods and applications of plant paleoecology. The
Paleontological Society Special Publication, 3:72-104.
MVSP Ver. 2.1 -- Users Manual Page 40
Kovach, W.L., 1988b. Quantitative palaeoecology of megaspores
and other dispersed plant remains from the Cenomanian of Kansas,
USA. Cretaceous Research, 9:265-283.
Kovach, W.L., 1989. Comparisons of multivariate analytical
techniques for use in pre-Quaternary plant paleoecology. Review
of Palaeobotany and Palynology, 60:255-282.
Kovach, W.L., in press. Multivariate techniques for
biostratigraphical correlation. Journal of the Geological
Society, London.
Kovach, W.L. & Batten, D.J., in press. Association of
palynomorphs and palynodebris with depositional environments:
quantitative approaches. In: Traverse, A. (ed.), Sedimentation
of Organic Particles. Cambridge University Press.
Lance, G.N. & Williams, W.T., 1966. A generalized sorting
strategy for computer classifications. Nature, 212:218.
Legendre, L., & Legendre, P., 1983. Numerical Ecology. Elsevier
Scientific Publishing Company, New York.
Lespérance, P.J., 1990. Cluster analysis of previously described
communities from the Ludlow of the Welsh Borderland.
Palaeontology, 33:209-224.
Manly, B.F.J., 1986. Multivariate statistical methods. A
primer. Chapman & Hall, London.
Noy-Meir, I., 1973. Data transformations in ecological
ordination. I. Some advantages of non-centering. Journal of
Ecology, 61:329-341.
Orloci, L., 1978. Multivariate Analysis in Vegetation Research,
2nd edition. W. Junk, Boston.
Pielou, E.C., 1969. An Introduction to Mathematical Ecology.
Wiley-Interscience, New York.
Pielou, E.C., 1984. The Interpretation of Ecological Data.
Wiley-Interscience, New York.
Prentice, I.C., 1980. Multidimensional scaling as a research
tool in Quaternary palynology: A review of theory and methods.
Review of Palaeobotany & Palynology, 31:71-104.
Sneath, D.H., & Sokal, R.R., 1973. Numerical Taxonomy. W.H.
Freeman & Co., San Francisco.
Sokal, R.R. & Rohlf, F.J., 1981. Biometry. 2nd Edition. W.H.
Freeman & Co., San Fransisco.
ter Braak, C.J.F., 1986. Canonical correspondence analysis: A
new eigenvector technique for multivariate direct gradient
analysis. Ecology, 67:1167-1179.
MVSP Ver. 2.1 -- Users Manual Page 41
OTHER PRODUCTS FROM KOVACH COMPUTING SERVICES
Wa-Tor for Windows - A population ecology simulation program for
Microsoft Windows.
Pit hungry sharks against tasty fish in an endless ocean. You
can set the initial numbers of fish and sharks, their birth
rates, and the shark starvation time. The population
fluctuations may be watched on the graphic display and via three
types of data plots. Based on an idea from Scientific American.
Price: £10.
Coming soon: Oriana - Orientation analysis for the IBM-PC and
compatibles.
This program calculates circular statistics on orientation data
measured in degrees. Calculates the circular mean, standard
deviation, and polar coordinates of a sample and compares pairs
of samples using Watson's F-test and the Chi-square test. Price:
TBA.
Coming soon: Rarefact - Rarefaction analysis for the IBM-PC and
compatibles.
This program estimates ecological diversity while taking into
account differing samples sizes. It uses Hurlbert & Simberloff's
method of estimating the expected number of taxa in a random
sample of a certain size, allowing samples to be compared on a
equal basis. Also calculates the variance around the estimate.
Price: TBA.
Consulting
Do you have a data analysis problem but don't have the time to do
it properly or would rather have an expert do it? Then contact
Kovach Computing Services. We provide data analysis services
using all appropriate methods in any field. Services include
publication quality graphics and full reports describing the
results and providing comments on their robustness.
----
Kovach Computing Services
85 Nant-y-Felin
Pentraeth, Anglesey LL75 8UY
Wales U.K.