Monster Media 1993 #2

home *** CD-ROM | disk | FTP | other *** search

/ Monster Media 1993 #2 / Image.iso / math / mvsp21.zip / MVSP.DOC < prev next >

Wrap

Text File | 1993-06-08 | 126KB | 2,353 lines

_______________________________________________________ MMMMMMMMMMMM VV VV SSSSSSSS PPPPPPPP MM MM MM VV VV SS PP PP MM MM MM VV VV SSSSSSSS PPPPPPPP MM MM MM VV VV SS PP MM MM MM * VVV * SSSSSSSS * PP * _______________________________________________________ S H A R E W A R E ----------------- A MultiVariate Statistics Package for the IBM PC and Compatibles (C) Copyright Warren L. Kovach, 1986-1993 Kovach Computing Services 85 Nant-y-Felin Pentraeth, Anglesey LL75 8UY Wales U.K. Internet: warrenk@cix.compulink.co.uk CompuServe: 100016,2265 Ver. 2.1, June, 1993 This program is being distributed as shareware. You may evaluate it for up to 30 days. If after that period you decide to continue using the program you must register. This costs 65 UK pounds or the equivalent in US dollars. See page 4 of this manual, the file REGISTER.DOC, or the "Register" option on the main menu for more details. MVSP Ver. 2.1 -- Users Manual Page 2 ACKNOWLEDGEMENTS In the years since I first released MVSP, I have received countless letters about this program, many with some very useful suggestions and comments. I have considered all of these and have incorporated most into this new version. My thanks go to all of those who have sent in comments. Special thanks go to John Birks (Bergen, Norway), Geoffrey King (Pickering, Yorkshire, England), Lou Maher (Madison, Wisconsin, USA), John Breen (Limerick, Ireland), and Bill Briggs (Boulder, Colorado, USA) for numerous comments on both the old and new versions of the program. Very special thanks go to my wife, Catherine Duigan, for numerous suggestions for improvements in the program, help in designing this manual and the cover, assistance in the distribution of MVSP, and for putting up with many hours of computer-widowhood. Warren L. Kovach "Tigh an-Oilean" Pentraeth, Anglesey, Wales June 1993 This manual and the accompanying program are protected by international copyright laws; (C) Copyright 1986-1993 Dr. Warren L. Kovach. This manual and the accompanying computer program may not be reproduced except as outlined in the section below entitled "Limited User Licence". MVSP Ver. 2.1 -- Users Manual Page 3 TABLE OF CONTENTS Acknowledgements................................................2 Introduction....................................................4 The Shareware Concept...........................................4 Limited Warranty................................................5 General Use of Program..........................................6 Starting the program..........................................6 Menus.........................................................6 Entering and Editing Text.....................................7 Menu Options....................................................7 A-F: Statistical Procedures...................................7 M: Manipulate Data............................................7 I: Import/Export..............................................8 S: Change Drive or Sub-directory..............................8 Q: Quit MVSP..................................................8 X: Execute DOS commands.......................................8 P: Change Program Defaults....................................8 Screen Colors..............................................8 Data File and Work File Path...............................8 Data File Extension........................................9 Output Format.............................................10 Graphics Options..........................................11 Printer Setup.............................................13 MVSP Data Editor...............................................14 Entering Data Labels.........................................15 Entering Data................................................15 Editing Labels and Data......................................16 Saving Data Matrix...........................................16 Data File Format...............................................16 Data Manipulation..............................................19 Import/Export Data.............................................22 Running Numerical Procedures...................................23 Principal Components Analysis................................24 Principal Coordinates Analysis...............................26 Correspondence Analysis......................................26 Distances and Similarities...................................29 Cluster Analysis.............................................32 Diversity Indices............................................34 Utilities......................................................35 Sortdata.....................................................35 Disclaimer.....................................................36 80x87 Support..................................................37 Protected Mode Version.........................................37 Appendices.....................................................38 References.....................................................39 Other Products from Kovach Computing Services..................41 MVSP Ver. 2.1 -- Users Manual Page 4 INTRODUCTION MVSP is a package of common multivariate statistical procedures widely used in many areas of biology and geology, as well as other fields. These procedures include principal components analysis (PCA), principal coordinates analysis (PCO), correspondence analysis (CA; also called reciprocal averaging), distance or similarity measures, hierarchical cluster analysis, and diversity indices. MVSP provides a great deal of flexibility in the analyses, but is simple to use. Options for different forms of these analyses can be chosen from menus and these settings can be saved for future use. Most analyses can be run with as few as half a dozen keystrokes. One possible drawback to ease of use is that some users may be very tempted to take a "black box" approach to using these statistics, feeding in numbers and coming up with "The Answer". I must strongly warn the users of this program that statistics can be DANGEROUS! All these procedures make assumptions about the data and have restrictions on what they can and cannot do. If these assumptions and restrictions are violated, the results could be meaningless. I urge you to become familiar with the methods before you use this program. This manual contains a list of references that I have found very useful in understanding these techniques. In particular, Sneath & Sokal (1973), Gauch (1982), Pielou (1984), Manly (1986), Davis (1986), and Kent and Coker (1992) are very well written and give very clear discussions of these techniques. I am always interested to see how MVSP is being used. I would appreciate receiving reprints of any papers you have published in which MVSP was used for data analysis. Thank you! THE SHAREWARE CONCEPT This software package is being distributed under the shareware concept. In case you haven't run across this software phenomenon, the following is a brief discussion of it's tenets. Shareware software is an experiment in "grass-roots" software distribution and development. Andrew Fluegelman, one of the pioneers of this phenomenon in the microcomputer world, expressed it this way: 1) The value and utility of software is best assessed by the user on his or her own system, under actual working conditions. 2) The creation of new and useful software should be supported by the computing community. 3) Copying and sharing of software that you have found useful should be encouraged, rather than restricted. Shareware programs are freely distributed to the computing community, through the network of electronic bulletin board services, local computer user groups, shareware disk vendors, and MVSP Ver. 2.1 -- Users Manual Page 5 networks of friends and colleagues with similar interests. You are allowed to try out the program for a certain period to see if it fits your needs. If it does and you intend to continue using it, then you must register the program with the author by paying a registration fee. In return you will generally get a copy of the latest version of the program, a printed manual, and perhaps other extras the author offers to encourage you to register. Shareware means that you don't have to pay outrageous prices for a program without getting a chance to test drive it first to see if it really meets your needs. Shareware means that if you decide that this program is worth supporting, then you support it voluntarily, for a reasonable cost, and without the hassles of copy-protection and the high cost of advertising. You are encouraged to copy and distribute MVSP Shareware. If after a 30 day evaluation period you find this program to be useful and decide to continue using it, then a registration fee of 65 UK pounds or the equivalent in US dollars should be sent to the author. See the file REGISTER.DOC, or the "Register" option on the main menu for details on how to register. In return for the contribution, you will receive: o the latest version of the program (without the shareware reminder messages) o a full printed manual, including the graphics and appendices that are not in the shareware version o the ability to take advantage of the 80x87 math coprocessor for faster and more accurate analyses o a protected mode version that will directly use up to 16Mb of RAM memory for faster analyses of larger data sets o the SORTDATA utility that creates graphic representations of your data matrices, sorted in the order of the dendrograms o notification of future versions as well as of other programs produced by Kovach Computing Services o special upgrade prices o technical support by phone, fax, e-mail or post This program is copyrighted. MVSP Shareware can be freely copied and distributed in accordance with the regulations specified in the accompanying file VENDOR.DOC. MVSP Shareware may not be modified or dis-assembled in any way or for any reason. Distribution of modified versions are also forbidden. LIMITED WARRANTY Kovach Computing Services warrants any physical diskettes and physical documentation provided under this agreement to be free of defects in materials and workmanship for a period of sixty days from the purchase. KOVACH COMPUTING SERVICES SPECIFICALLY DISCLAIMS ALL OTHER WARRANTIES OF ANY KIND, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. MVSP Ver. 2.1 -- Users Manual Page 6 The total liability of Kovach Computing Services for any claim or damage arising out of the use of the licensed program or otherwise related to this licence shall be limited to direct damages which shall not exceed the price paid for the program. IN NO EVENT SHALL THE LICENSOR BE LIABLE TO THE LICENSEE FOR ADDITIONAL DAMAGES, INCLUDING ANY LOST PROFITS, LOST SAVINGS OR OTHER INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF OR INABILITY TO USE THE LICENSED PROGRAM, EVEN IF LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. This agreement does not affect your statutory rights. The agreement shall be interpreted and enforced in accordance with and shall be governed by the laws of England and Wales. GENERAL USE OF THE PROGRAM Starting the program: This program is simple to use and menu-driven, presenting you with the possible options at each step. It is initiated by first logging into the disk and directory containing the program (using the DOS commands CD, A:, C:, etc.) and typing the name of the program, "MVSPSHAR". For instance, if you have installed MVSP on you hard disk in the directory C:\MVSP, type: C: CD C:\MVSP MVSPSHAR The program file MVSPSHAR.EXE must be in the default directory (the one specified in the CD command above) for the program to work properly. If you wish to use the help facility, the file MVSP.HLP must also be in this directory. If you have changed any of the program defaults, the configuration file named MVSP.CNF (which is created when you save your changes) must also be on the default drive for the new options to be reinstated. You may also specify the location of the MVSP files using DOS environment variables and the commands "SET" and "PATH". For instance, if the MVSP files are in the directory C:\MVSP, you may place the two following commands in your AUTOEXEC.BAT file: PATH C:\;C:\MVSP (this line may also contain other directories) SET MVSP=C:\MVSP After rebooting, you may start the program by typing MVSP, regardless of the current directory. You may edit your AUTOEXEC.BAT file with any word processor or text editor that produces plain text (ASCII) files. Many will have a special non- document mode for this. Refer to your word processor manual for details. Also, DOS' EDIT or EDLIN program may be used. Menus: When the program is loaded, you will see an introductory screen giving the name of the author, then after pressing any key you will be presented with a menu of available procedures. The first MVSP Ver. 2.1 -- Users Manual Page 7 option on the menu will be highlighted by a rectangular cursor. This cursor can be moved through the list of options by using the up and down arrow keys. A choice is made by pressing the carriage return when the correct one is highlighted, or alternatively by typing the letter preceding the desired option. Usually, choosing an option will bring up a second menu, from which you can often call up a third, and so on. The number preceding the title on each menu indicates the level you are at in the hierarchy; If you get lost, remember that pressing 'Q' or ESC will bring you back to the previous menu. MVSP has an extensive help facility that provides information about every menu option. To get help, just place the cursor on the desired option and press the F1 key. After reading the text, pressing any key will bring you back to the menu. Entering and editing text: You will often be asked to type in a string of text, such as the name of a data file. In some cases you are provided with a default choice, which you can accept or modify. MVSP has a number of editing commands to help in this modification. You can use the cursor keys to move the cursor back and forth, the DEL and Backspace keys for deleting text, and the letter keys for adding text. When you first begin editing a text string, the program is in insert mode, so that any text you type will be inserted and the remaining text will be pushed the right. Pressing the INS key toggles insert mode on or off (indicated by the thickness of the cursor); with it off old text is overwritten by the new. Pressing ESC will clear the input line to allow you to start from scratch. If you press the Enter key after clearing the line you will exit that procedure. When you are entering the name of the input data file, pressing F3 will recall the last valid filename you entered during that session. You may then use that file again or modify it if you want to use a similarly named file. MENU OPTIONS The main menu lists the six available numerical procedures as well as a few other options. It looks like this: <Graphic placed here in printed manual> Options A-F: These options are the basic numerical procedures; principal components analysis, principal coordinates analysis, correspondence analysis (reciprocal averaging), similarities and distances, cluster analysis, and diversity indices. These are described later in this document. Option M: The MANIPULATE DATA option provides facilities for data entry, editing, and transformation. A simple spreadsheet-like data editor is provided for initial entry and subsequent modification of the data. Procedures are also provided for transposing and transforming the data, converting to other scales, and deleting rows and columns. The full use of these facilities is described MVSP Ver. 2.1 -- Users Manual Page 8 below. Option I: The IMPORT/EXPORT option allows you to transfer data between MVSP files and other file formats. Currently Lotus 1-2-3/Symphony and Cornell Ecology Program file formats are supported. The full use of this option is described on page 8. Option S: This option, CHANGE DRIVE OR SUBDIRECTORY, allows you to specify the default location of the input and output data files. If you enter a path name without a drive specification, the default drive is assumed. If you enter just a drive specification (e.g. "A" or "A:") the default path will be the current directory of that drive. A "?" lists the sub-directories of the current directory. A carriage return with no other input exits this option with no changes. Option Q: QUIT MVSP will exit the MVSP program and return to the DOS prompt. Option X: The EXECUTE DOS COMMANDS option allows you to temporarily drop out to (or "shell to") DOS while you are running MVSP. Rather than exiting the program completely, this option allows you to keep MVSP loaded in memory, with all you current options intact, while you work at the DOS prompt. When you are ready to return to MVSP, simply type the command "exit" at the DOS prompt. When you shell to DOS, the running program of MVSP will be saved to disk or EMS memory, allowing as much DOS memory to be freed up as possible. On typing "exit" this saved image will be reloaded and you will be returned to MVSP in the state it was when you left. Option P: The CHANGE PROGRAM DEFAULTS option allows you to change many of the default settings for the program. These specifications can be saved to the file MVSP.CNF, which will be reloaded each time the program is run, reinstating these defaults. When you choose this option you will be presented with a menu asking which type of default should be changed. <Graphic placed here in printed manual> C - SCREEN COLOURS allows you to change the colour of the regular text and background, the menu text and background, the menu frame, and the help screens and error messages. Choosing one of these will cause a menu of available colours to appear. You can experiment with colour combinations easily, quitting the colour menu when you are satisfied. Note that option "F" on the menu resets black and white colours. This option can be useful in case you get yourself into a colour combination that is so unreadable that you can't see the options available! P - DATA FILE AND WORK FILE PATH changes the default path used MVSP Ver. 2.1 -- Users Manual Page 9 for data files, just like option S above. If you are using a two floppy disk system, it is often most useful to have the program files in drive A: and to have the default data file path set to B:, so that data files are on another disk. If you have a hard disk, you could have the program files in a subdirectory named C:\MVSP (which would be the default directory when you invoke the program) and the data either on a floppy disk in drive A: or B:, or in a hard disk directory named C:\MVSP\DATA. You would then specify the default data file path through this option. You can even set up separate directories for different types of data, which is where the temporary path change option ("S" on main menu) would come in handy. You can always override the default path option by specifying the drive and path when you are asked for the name of the data file while running one of the statistical procedures. After entering the default data file path, you will be asked for a disk drive where the temporary work files will be stored. If the data set you are analysing is too large to fit in memory, parts of it will be stored on disk until needed. This will slow down the calculations considerably, since data retrieval from a disk is much slower than from memory. Floppy disks are much slower than hard disks, so always choose a hard disk for the work files if you have one available. If your computer has extended or expanded memory (memory above 640K), then you can set this up as a RAMdisk that will emulate a disk drive but operate much faster, thus speeding up any analyses that must write data to disk. See Appendix 4 (only included with the registered version) for details of how to do this and general information on memory management in MVSP. E - DATA FILE EXTENSIONS allows you to change the default extensions for your input and output files. The default values are *.MVS for input files and *.OUT for output files, but you can easily change this and save your changes. The PCO and cluster analysis procedures can have different defaults, which facilitates the input of similarity or distance coefficients. The coefficients program will output a symmetrical matrix in the form required by the PCO or cluster procedures, if so asked, and will default to the extension that you specify for PCO and cluster analysis input (*.MVD is the initial setting). The output files for these can also have their own default extension (*.OT2 initially). You can also specify default extensions for the tree description and tree order files produced by cluster analysis. R - REREAD CONFIGURATION FILE will reread the MVSP.CNF configuration file that contains the user default settings. This will reinstate the default settings that are normally active when the program is initiated. This can be handy if you have made a lot of changes to defaults during a session (without saving them!) and you wish to return to your old defaults. S - SAVE DEFAULTS TO FILE MVSP.CNF will save any changes in the defaults to a configuration file, which will be reloaded every time the program is run. If this file is not found in the same directory as the other MVSP program files, the internal defaults MVSP Ver. 2.1 -- Users Manual Page 10 will be set. Q - QUIT CONFIGURE will return to the main menu. O - OUTPUT FORMAT allows you to change the format of the printouts obtained from MVSP analyses as well as the method used for writing to the video screen. <Graphic placed here in printed manual> P - The PAGE WIDTH option sets the number of characters that can be printed per line on your printer. Normally this is 80 characters, but if you have a wide carriage printer or a printer capable of compressed printing at 15 characters per inch, then this can be reset to 130. The "Printer Setup" option described below allows you to set your printer to print in compressed mode. The "Page Width" option also affects the length of lines in data files created by the "Data Manipulation" and "Distances and Similarities" procedures. C - RESULTS COLUMN WIDTH sets the number of characters used to represent each number and column heading on the printout of the results. With numbers, this column width is for the entire number, including the decimal point, decimal fraction, and the space between numbers. Thus " 2345.67" requires a column width of at least 8 spaces, including a leading space. Narrower column widths allow more columns to be printed across a page, thus saving paper, but some numbers may be too large to be represented in the smaller space. If a number is larger, the whole number will be printed and the alignment of the columns will be disrupted. Symmetrical matrices created by the "Distances and Similarities" procedure also use the values specified here and in option D. D - RESULTS DECIMAL PLACES sets the number of decimal places to be displayed for each number. Generally this should be at least 2 or 3. Whole numbers (those that have a decimal portion that is zero to the accuracy of the computer) will be displayed without the decimal portion. Numbers that are smaller than can be represented in the allotted decimal places will be printed in exponential form. For instance, if the decimal places option is set to 3, and a number 0.00001 must be printed, it will be printed as 1.0E-05 (1.0 x 10-5). O & E - DATA COLUMN WIDTH and DATA DECIMAL PLACES are similar to the above two options, but they apply only to printouts of the raw data and to data files created by the "Data Manipulation" procedure. For instance, if your data are always whole numbers less than 100, then you could set the data decimal places to 0 and the data column width to 4. M - SCREEN OUTPUT METHOD lets you toggle between two methods of screen output, direct screen memory output and BIOS output. The direct memory method writes data directly to the area of memory that controls the screen, while the BIOS method uses calls to your computer's BIOS (basic input/output system). The direct output method is much faster, but only works on computers that MVSP Ver. 2.1 -- Users Manual Page 11 are hardware-compatible with the IBM-PC (almost all IBM compatibles sold these days are hardware-compatible). Direct output also will cause problems when used under some windowing environments such as older versions (ver. 2) of Microsoft's Windows. If you are using one of these environments, you must either run MVSP as a full-screen application or choose BIOS output to allow MVSP to run in a window. Note that both Windows 3 and Quarterdeck's Desqview will run MVSP in a window without choosing BIOS output, thus allowing faster screen output. V - CHECK FOR VIDEO "SNOW". On some brands of colour graphics adapter boards (most notably IBM's original), the fast method of writing directly to the screen memory can cause interference, or "snow", on the screen. This occurs when both the program and the computer's graphics hardware try to work on the screen memory at the same time. This option forces the program to check the screen memory before writing to it to make sure there will be no interference. This eliminates snow, but also slows down the output somewhat. If your graphics adapter is not susceptible to snow, then this option should be set to "No" for optimal speed. If snow appears, then set the option to "Yes". G - GRAPHICS OPTIONS allows you to change a number of defaults related to the scattergrams produced by the ordination procedures. <Graphic placed here in printed manual> P - SCATTERPLOT/DENDROGRAM TYPE lets you select either text or graphics plots. Text plots are produced using regular characters such as "-" and "|" and "*" that can be printed on any printer or video screen. The placement of the points for scatterplots is restricted to a grid of 70x22 or 110x55 characters, therefore the accuracy of these graphs is limited. Text-based dendrograms will be scaled to fit the width of the page and will extend as long as necessary, even over multiple pages. Graphics plots are produced by switching your video monitor to graphics mode and drawing the graphs with lines and dots. These are more accurate and aesthetic (see example below). However you must have a graphics monitor to display these. MVSP supports CGA, EGA, VGA, VESA Super VGA, Hercules, and AT&T or Compaq plasma display 400- line graphics monitors. Except for the case of the 400-line mode (see "400 LINE GRAPHICS MODE" below), MVSP will detect which type of monitor is present and adjust accordingly. The appropriate device driver file (CGA.BGI, EGAVGA.BGI, VESA.BGI, HERC.BGI, or ATT.BGI) must be in the same directory as the program files <Graphic placed here in printed manual> Graphics scatterplots can be printed on dot matrix printers either directly or through the DOS GRAPHICS screen-dump facility. If you have a printer that is compatible with those listed under "Printer Setup", plots can be output directly by choosing the PRINT GRAPHICS AUTOMATICALLY option described below. For those with other types of printers, check your DOS manual to see if your printer is supported by the GRAPHICS command. If so, running GRAPHICS before MVSP will allow you to print the graph MVSP Ver. 2.1 -- Users Manual Page 12 using the Print Screen key. Note, though, that printing directly from MVSP will give much higher resolution plots. W - WIDE TEXT PLOTS are plots that are produced with a grid of 110x55 characters. If you have a wide carriage printer and paper or a dot matrix printer capable of compressed mode printing (see "PRINTER SETUP" below), then these graphs can be used, giving higher resolution. Normally, a single wide text plot will fill a whole page. However, I usually use a special print mode that is a combination of compressed and superscript characters with a line spacing of 12 lines per inch instead of the default 6 ("tiny" print on the "TEXT STYLE" menu below). This produces tiny but readable characters and allows two plots per page. G - PLOTS PER PAGE allow you to specify how many plots to print before issuing a new page command to the printer, thus ensuring that plots aren't printed over the fold of the paper. In regular text mode two plots fit per page but only one fits in wide text mode (but see previous paragraph). In graphics mode you will be able to fit one plot per page. If you are using the DOS GRAPHICS utility to do a screen dump of the plot, then set this option to one plot per page as well. L - DATUM LABEL TYPES. By default, MVSP represents each plotted point with a letter or other character. These symbols are also listed on the printouts in a column headed "PLOT" so that you can tell which case or variable is represented by each point. This is the "Sequential" mode of data labelling. You may also choose "Label" mode in which the first character of each datum label is plotted. This is useful if you can assign the cases or variables to distinct groups (such as environment type, sociological group, or taxon) In these cases you use different letters or symbols as the first character of each label in order to represent each group. With these plotted, you can tell at a glance how well the groups are distinguished by the analysis. M - 400 LINE GRAPHICS MODE is a special mode used in AT&T 6300 and Compaq Portable III and 386 computers, among others. This is similar to CGA high resolution mode but uses a resolution of 640x400 rather than 640x200. MVSP can usually tell what type of display is being used, but these 400 line mode displays will be detected as CGA monitors. To take advantage of the 400 line mode, set this option to "Yes". The file ATT.BGI must be present in the directory containing the MVSP program files. A - PLOTS PER ANALYSIS allows you to specify how many axes to plot for each analysis. If you know ahead of time that you want to see the first three axes plotted against each other, set this value to 3. You may wish to see the results before deciding how many axes to plot. In this case, enter "-1" for the number of plots; you will then be asked how many to plot as the procedure is running. E - PRINT GRAPHICS AUTOMATICALLY specifies that you wish to have the graphics plots automatically printed rather than drawn on the screen. Set this option to "Yes" to do this. If you instead wish to examine the plot on the screen before deciding to print MVSP Ver. 2.1 -- Users Manual Page 13 it, set this option to "No". When the plot is drawn on the screen, the program will pause to allow you to look at it. If you decide to print it, simply press the "P" key, otherwise press any other key to go on. Also use the "No" option if you aren't going to print the graphics mode plots, or if you are using the DOS GRAPHICS screen-dump facility to print them. T - PRINTER SETUP option - This option allows you to specify what type of printer(s) you are using. Separate printers can be used for the output of text results and graphics plots, so that you could, for instance, have the results printed on a dot matrix printer and the graphs on a plotter or high resolution laser printer. There are several options on this menu: <Graphic placed here in printed manual> T - TEXT PRINTER - This option allows you to choose one of several printers for the output of text results. This output will include the numeric results as well as graphs if text mode plots are chosen under the "Scatterplot/Dendrogram Type" option. The "Plain ASCII" option will send text to the printer without any control codes. The "Other" option allows you to specify the printer codes in a similar manner as in MVSP version 2.0. To do this you must first consult your printer manual to determine the codes needed for the desired text effect. Then, using this option, enter the decimal (not hexadecimal) codes with each individual value preceeded by a slash (e.g. "\27\69" for bold print on an Epson printer). You may enter the codes to set a certain text effect and to reset the printer to its default condition at the end. Y - TEXT STYLE - MVSP can automatically set up your printer to use different text styles for the printouts. Normal printing gives output in your printer's default text mode. Compressed will give text that can fit up to 130 columns on a single page of A4 or 8.5"x11" paper. Tiny print is also compressed to allow for 130 columns but the text itself is also half as high as normal, allowing for twice as many lines per page as well. If you choose compressed or tiny print, make sure to set the "Page Width" option (page 10) to 130 columns. Z - PAPER SIZE - This allows you to choose the size of paper used in your printer. You may specify letter, legal, or A4 size. If you are using wide carriage paper, choose the size that matches the length of your paper. P - GRAPHICS PRINTER - You may choose from several types of printers and plotters for your graphics output. You may also save the graphs to a .PCX bitmap file at 640x480 resolution. M - GRAPHICS PRINTER MODE - Each printer type has a number of modes associated with it. These modes cover the resolution of output and/or the page size. The available modes vary for each printer type. Note that with some printers, most notably the HP Laserjet, the highest resolution printouts can often take a long time to complete. MVSP Ver. 2.1 -- Users Manual Page 14 D - GRAPHICS OUTPUT DEVICE - You may specify to which parallel or serial port your printer is attached. This allows you to have two printers attached to one computer on different ports (the text printer is always assumed to be on LPT1). You can also have graphics output directed to a file. You can then send it to the printer later using the DOS "COPY /B filename portname" command, where "filename" is the name you specify for the output and "portname" is LPT1, LPT2 COM1, or COM2. You can also save graphics output to a file if you want to import the plot into a graphics program for further editing or inclusion in other documents. Many drawing, painting, and desktop publishing/word processing programs allow you to import graphs in a number of formats. The three graphics printer types in MVSP that can be used for this purpose are HP Plotter, Postscript, and bitmap. The Postscript files can be treated as Encapsulated Postscript files (.EPS or .AI) by many programs. W - PLOT WIDTH (CM) - This option allows you to specify the width (in centimetres) of the graph on the page. The graph will be centred on the page. If the value you specify is larger than the page size it will be scaled to fill the page. H - PLOT HEIGHT (CM) - This option, together with "Plot Width", allows you to specify the size of the graph. DATA EDITOR Data files may be constructed using the MVSP data editor. This editor is similar to a spreadsheet program. Data are entered and presented in a tabular format, with the rows being the variables and the columns being the individual cases or objects. To use the data editor, first choose the "Manipulate Data" option from the main menu and specify a filename. If that file exists, it will be loaded into the editor for modification; if not, you will be asked if you want to create a new file. You will now be presented with the Data Manipulation menu. Choose "Enter/Edit Data". If you are creating a new file, you will have the option of reading in data from another file for modification and saving under the new name. You will next be asked to enter the maximum number of rows and columns needed for the data matrix. MVSP must set aside a certain amount of memory for working with the data matrix. If you are editing an existing data matrix and don't plan to add new rows or columns, then just accept the default values for rows and columns. If you are adding rows or columns to either a new or old matrix, then enter the maximum number needed. If you aren't sure of the exact number, over-estimate. This will only cause MVSP to set aside some extra memory while you are editing; it will not have any lasting effect. You will also be asked to enter or modify a title for the file. This title identifies the data and will be printed out along with the results of each analysis. You may enter up to 79 characters, MVSP Ver. 2.1 -- Users Manual Page 15 so be as descriptive as you can. Note, however, that the "Distances and Similarities" procedure uses the last few characters of the title to place a label on the output symmetric matrix file identifying the coefficient used. If you use most or all of the 79 available characters, make sure no vital information is at the end of the title, or it will be overwritten by the identifier. Entering data labels: When creating a new file, you are first presented with a blank spreadsheet with the cursor in the upper left corner. You must first enter some labels for the rows and columns. You will notice that the cursor will only move about in rows and columns that have labels or in the next blank row or column. This is to avoid having spurious values placed in areas that aren't meant for data. By entering a row or column label, you are telling the program that this is another variable or case to include in the matrix. When you enter a new label, that row or column will then be filled with zeros to indicate that it is now considered part of the data matrix. To enter labels, all you need to do is to start typing the label while the cursor is in the desired row or column. When you start typing, the bottom line will display the word "INPUT>" and the characters you type will appear on this line. You may edit the text using the backspace, cursor, insert, and delete keys, as described in the section "Entering and editing text" above. Once you are finished typing the label, you then place the label in the matrix by typing one of the cursor keys (but not the Enter key). This tells the editor whether the label is for a row or column. Typing the up or down arrow cursor keys declare that label to be for a row, while a left or right arrow key indicates a column label. The cursor will also move in the appropriate direction so that you are ready to enter another label. The labels themselves can be up to ten characters long and may consist of any printable character, except spaces. The following are all valid labels: ROW1 COLUMN_2 1st-Loc. #3-Site This label is NOT valid: SITE 1 It will be read as two labels, "SITE" and "1". If you are using labels that begin with a number (such as 1st-Loc.), you must precede the label with a single or double quote (' or ") so that the program will know that you are not attempting to enter numeric data. Entering data: Once you have a few labels entered, you may start entering the data themselves. This is done in a similar way to the labels; MVSP Ver. 2.1 -- Users Manual Page 16 just start typing a number when the cursor is in the appropriate place. Input of each number is finished by pressing one of the cursor keys or the "Enter" key. If you enter any characters that cannot be converted to numeric form, an error message will be displayed and you may edit the input to correct the mistake. The only valid characters for numeric data are '0'-'9', '-', '+', '.', and 'E'. The 'E' is used for entering numbers in scientific notation, so that 0.00001 (1.0 x 10-5) may also be entered as "1.0E-05". Binary (presence/absence) data should be entered as "0" and "1", with a "0" indicating absence. Editing labels and data: Editing of data and labels can be done in two ways. In either case, the cursor must first be placed in the appropriate row and column. Then you may either type the value anew, as you would for entering data, or you may use one of the editing function keys. The function of each of these keys is listed at the bottom of the screen. Pressing F2 will bring the datum to the bottom line of the screen, where it can be edited. F3 will allow you to edit the row label and F4 the column label. These can be edited and entered into the matrix as described above. Saving data matrix: Pressing the F9 key will save the data matrix to a file along with all the changes you have made so far. I would suggest doing this frequently to avoid losing any changes you have made due to malfunction or mistakes in editing. The F10 key will save the changes and exit back to the main menu. If you decide to abandon the current editing session, press the ESC key. You will first be asked to confirm that you want to exit, then you will be returned to the main menu. All changes made since your last save will be lost. DATA FILE FORMAT Data files from other sources can be imported to MVSP either directly or with minor editing. If your data are in Lotus 1-2-3 or Symphony worksheets or in files for the Cornell Ecology Programs (Decorana and Twinspan) then they can be imported directly (see page 22). Otherwise the data can be transferred as text files. Most database and spreadsheet programs have an option for outputting data to plain text (ASCII) files. A word processor or text editor can then be used to modify the resulting files to the appropriate format for MVSP (mainly by adding the file header information, discussed below). Data files for MVSP must be in ASCII format. This means that they should consist only of letters or numbers, spaces, and most other symbols represented on the keyboard. Many word processers insert special formatting characters that will not be able to be read by MVSP. You can check whether your word processor is one of these by listing a word processed file to the screen with the DOS TYPE command and looking for strange characters. If your word processor uses these extra characters, make sure you modify MVSP Ver. 2.1 -- Users Manual Page 17 your data files in a non-document mode that creates normal ASCII files. DATA FILE HEADER: The first line of the data file should be a header line, which will give the program some information about the data, such as the number of rows and columns. It should look something like this: * 10 15 This header line should begin with an asterisk ("*") in the first column of the first line of the file. This asterisk tells the program that a header is present. If the asterisk is not found, the program assumes that the header information is not present, and it will prompt the user for the information. MAKE SURE that if this header information is present, there is an asterisk before it; if not, the header information will be read as data! The two numbers are the number of rows and columns in the data matrix. The above example has 10 rows and 15 columns. You may also include data labels in the data file. These labels will be printed on your output to help make sense of the masses of numbers that will be spewed out. If labels are included, this must be specified in the file header. For example: *L 10 15 specifies a data file that includes data labels and that has 10 rows and 15 columns (NOT including the labels themselves). The "L" must come immediately after the "*", with no intervening spaces, or it will be read as the number of rows, and an error will occur. The numbers of rows and columns must be separated by at least one space from each other. DATA LABELS: The format of the data labels is explained above under "Entering Data Labels". When data labels are included, both row and column labels must be present. The column labels should be in the second row of the data file, after the header line, and the labels should be separated by at least one space. The labels may be continued on to subsequent lines; the program will continue reading column labels until it has read as many as the number of columns you have specified in the header line. Row labels occur on the same line as the data row to which they apply, and should precede the first datum in that row, with a space separating the label and datum. DATA FILE TITLES: A title may also be added to your data file on the header line, so that you know what these data represent. Here's an example *L 10 15 Test data file for MVSP This title will be listed to the screen and placed on the output when that file is selected. It must be separated from the other elements of the header by at least one space, and it cannot be more than 79 characters long. The Distances and Similarities procedure will also place this title in the header of the matrix MVSP Ver. 2.1 -- Users Manual Page 18 output file, along with the specification of which coefficient was used, so that the title is carried over to the clustering program. DATA MATRIX: The data matrix itself should consist of the data points separated by at least one space. The data for one row can be continued on the next line. If the number of rows or columns you specify is wrong, the data matrix will be read incorrectly, often without warning. If you have a 10x10 matrix without labels and specify 9 columns by mistake, the last datum on the first row will be read as the first datum of the second row, and so on. This, needless to say, can raise havoc with your results! All procedures can print out the raw data so that you can check to make sure it was read correctly. Here is a complete example data file: *L 5 10 Test data set for MVSP COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8 COL9 COL10 ROW1 23 2 4 53 6 45 2 3 67 5 ROW2 10 2 4 34 1 4 3 10 20 3 ROW3 2 34 0 1 35 12 1 90 10 9 ROW4 98 12 10 4 10 9 10 5 20 31 ROW5 1 7 9 11 75 7 5 21 0 10 The input data files for the cluster analysis and PCO programs use a slightly different header format. Here is an example: *L 15 DIS Test data set for MVSP Since the clustering and PCO programs use a symmetrical matrix as input, it only needs one number for the size of the data matrix. In this case the size of the matrix is 15x15. The third element of the header is a three letter abbreviation specifying whether the matrix is a similarity (SIM) or distance (DIS) matrix. This code MUST be separated from the number of objects by only one space, or it will not be read correctly. The "Distance and Similarity" procedure of this program automatically sets up its output files in this manner for input into these procedures. Here is an example of a symmetrical input file, generated from an analysis of the above matrix, using the Spearman Rank Order Correlation Coefficient: *L 10 SIM Test data set for MVSP - SPEARMAN COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8 COL9 COL10 1.00 -0.15 1.00 0.36 -0.05 1.00 0.20 -0.97 0.05 1.00 -0.60 0.67 0.15 -0.60 1.00 0.30 0.21 -0.31 -0.00 0.10 1.00 0.30 -0.05 0.97 0.00 0.10 -0.50 1.00 -0.80 0.62 -0.41 -0.70 0.60 -0.30 -0.30 1.00 0.82 -0.55 -0.03 0.62 -0.82 0.41 -0.10 -0.87 1.00 0.10 0.67 0.67 -0.60 0.70 0.10 0.60 0.10 -0.41 1.00 Note that this is a lower half matrix, with diagonals (the MVSP Ver. 2.1 -- Users Manual Page 19 1.00's) included. Other forms of matrices may also be specified for input to the clustering program, as discussed below, but this is the default output format of the similarities and distances procedure. DATA MANIPULATION When you choose the "Manipulate Data" option from the main menu, you are first asked to provide an input filename. You may then transpose or transform those data, convert them to other scales, drop rows or columns that are either selected by yourself or that have totals of zero. Any combination of these options may be chosen. When the Run command is chosen, a new data file will be produced with all the changes you have selected. You will be asked to provide a name for the new file; this must be different from the input file. <Graphic placed here in printed manual> Transform data: The Transform Data option allows you to choose to have the data log or square root transformed before analysis. Most of these procedures assume a normal distribution of the data, but this assumption is often not met. Log transforming the data can reduce the skewness of the data (Sokal & Rohlf, 1981), resulting in a more interpretable analysis. In my work with fossil plant data, I've found this to be invaluable, as I always have some samples with extremely high abundances of certain taxa, and these taxa tend to dominate the analysis due to their large numbers. Log transforming the data evens this out. You are given the option of what base of logarithm to use. Square root transformation is also often used when the data are in the form of counts. Please note that the log transformations are performed on the values x+1, rather than x. This is done to avoid computer errors when the data value is 0, since the log of 0 is undefined, and to avoid negative results when the value is less than 1. The logratio transformation (Aitchison, 1986) was designed specifically for compositional (percentage or proportional) data. These data are affected by closure, in which the increase of one variable necessitates the relative decrease of another, even if the absolute value of the other doesn't change. This can cause many problems in statistical analyses. The logratio transformation eliminates the closure problem by replacing the proportions with the log of the ratio between the proportion and the geometric mean of the sample. In mathematical terms, this is: x' = log(x / g ) i,j i,j i where: x = proportion of taxon j in the ith sample i,j MVSP Ver. 2.1 -- Users Manual Page 20 x' = transformed value i,j g = (x +...+x )1/n = geometric mean i i,1 i,n n = number of taxa in the sample It should be emphasized that, for the logratio transformation to be calculated properly, the samples MUST be the columns of the data file. Otherwise the calculations will be meaningless. Problems arise with the logratio transformation when some of the proportions are zeros, since taking the log of zero produces an error. This is remedied in MVSP by replacing them with a very small value and then readjusting all other proportions so that the total is 1.0. The replacement values are calculated using Aitchison's (1986, p.269) zero replacement formula. This formula incorporates a maximum rounding-off value that can affect the final results. You can set this value when choosing the logratio transformation. The new value can be saved to the configuration file. You may want to try several runs with different values to assess the effect. Transpose data: The "Transpose Data" option is another that is common to all procedures. This allows you to transpose a matrix before analysis, so that the rows of the matrix are treated as columns. Convert data: Convert Data allows you to change the scale of the data to percentages, proportions, standardized scores, binary, the octave class scale, or range-through type stratigraphic data. In percentage and proportional data, the values are adjusted so that the columns sum to 100 or 1.0 (respectively). The standardized scores are adjusted by rows to zero mean and unit standard deviation. Binary converts all non-zero values to 1. It is sometimes useful to perform analyses on binary data to remove the effects of abundance on the results. The octave scale, which is often used in plant community ecology (Gauch, 1982), is a ten point abundance class scale, roughly based on log2. Percentage data are converted to the classes based on the following scale: 0 = 0 >0 - 0.5% = 1 >0.5 - 1% = 2 >1 - 2% = 3 >2 - 4% = 4 >4 - 8% = 5 >8 - 16% = 6 >16 - 32% = 7 >32 - 64% = 8 >64 - 100% = 9 The scale was first developed as a convenience for visual MVSP Ver. 2.1 -- Users Manual Page 21 estimation of abundances. It may also be used to convert fully quantitative data to a simpler scale. Much of the minor variation in abundances can be viewed as stochastic noise rather than significant trends (Gauch, 1982). By breaking the data into ten classes this minor variation is eliminated and only the major 'signal' is preserved. Arguably, the multivariate methods provided by MVSP should also separate out these major trends, leaving the noise in the background but very noisy data can complicate this and, in ordinations, will cause the major trends to account for only a small proportion of the total variance. In comparisons of PCAs performed on raw, octave transformed, and logratio transformed data, the octave scale performed equally as well as the logratio, with very little difference in the results on the first three axes (Kovach & Batten, in press). The range-through conversion is provided as a convenience to geological biostratigraphers. In analysing the stratigraphic distribution of fossil organisms it is often desirable to treat taxa as being present in all samples between the first and last occurrence in a vertical sequence. This assumes that the absence of a species in the middle of its range is due to ecological differences or sampling bias rather than the actual absence of the organism from the region at that time. In performing the range-through conversion, it is assumed that the columns are samples, that they are arranged in stratigraphic order, and that the data values are abundance or presence-absence (with absence indicated by a 0). Each row (taxon) is scanned for the first and last occurrence of that taxon, then those and all samples in between are converted to 1's, to indicate the presence of that species. All other samples are left at 0. Drop rows and columns: There will often be times when you wish to analyse a subset of your data. This option allows you to easily create new data files that are subsets of another. When this option is set to "Yes" and the procedure run, you will be presented with a list of the row labels. You may move the cursor around the list and mark the labels of the rows you wish to delete by pressing the space bar. This will cause that label to be shown in reverse highlighting (after you move the cursor), indicating that it will be dropped. You may unmark the label by pressing the space bar again. When all the labels to be dropped are marked, press the carriage return. You will next see a list of the column labels, which you can mark in the same way. Press the carriage return again and a new data file will be created without those elements that were marked. Drop zero elements: This option will scan through the data matrix looking for and removing any rows or columns that have totals of zero. Often when rows and columns are dropped using the previous option, some cases or variables are left with only zero elements. This can cause problems with some procedures; the CA procedure won't work at all if there are any columns or rows with zero elements while they can distort the results of other analyses. It is a good idea to set this option as well when you are choosing rows and columns to be deleted. MVSP Ver. 2.1 -- Users Manual Page 22 IMPORT/EXPORT DATA You can transfer data between MVSP and two other file formats. This is done through the "Import/Export Data" menu option. With this option, you are first presented with a menu allowing you to specify which file format you want to use and whether to import or export data. When you choose "Run", you are then asked for the file name, and can request a directory listing of all files of that format. CEP Files: These are files produced for the Cornell Ecology Program series, including DECORANA, TWINSPAN, as well as related programs such as Cajo T.F. ter Braak's CANOCO program (ter Braak, 1986). These programs use a compressed data file format in which only non-zero abundances are included. The data are presented in couplets, with the first number indicating the taxon (variable) and the second being the actual abundance. The couplets for each sample are grouped on one or more lines, with the sample number being specified at the beginning of each. This import option works in a similar way, and replaces, the separate utility REFORMAT that was distributed with earlier version of MVSP. Lotus 1-2-3/Symphony Files: The spreadsheet files used in Lotus' programs 1-2-3 and Symphony can be read and written by numerous other programs, so that this format has become a common means of data exchange among IBM-PC compatible software. MVSP can read files produced for 1-2-3 versions 1 and 2, as well as Symphony version 1 files (those with the extensions .WKS, .WK1, and .WK2). The files it produces are .WKS files, for 1-2-3 version 1. When reading Lotus files, MVSP assumes that the data are in a matrix form, similar to the MVSP file format itself. First, it will assume that you have a title for the data file at the top of the spreadsheet grid, preferably in row 1. The next row will contain the column labels, with each label occurring in the same column as the associated data. Next the actual data will occur, with each row of data (variables) on a single row of the spreadsheet grid and each sample in a single column. The row labels occur on the same row as their associated data and occur before any of the data (preferably in column A). MVSP will scan through the file first before actually importing it to determine exactly where the data and labels are located, so there is some scope for flexibility in the placement of the data. However, if you follow the format specified above there is less chance of failure in importing the data. After the data have been imported you will want to check the resulting matrix for columns or rows with the labels "ROWn" or "COLn", where n is a number. These indicate that MVSP overestimated the extent of the data matrix (usually due to stray cells in the spreadsheet). These should be filled with zeros but if they actually contain MVSP Ver. 2.1 -- Users Manual Page 23 data, then you will need to check the rest of the data and the spreadsheet for inconsistencies. Any formulae in the data matrix will be read as numbers, with the current result of the formula being placed in the MVSP file. Otherwise, the data are assumed to be numbers. If any non- numeric data is found in the area where MVSP expects to find numeric data, these will be replaced with a missing value marker (-999999999) so that you can easily find them and replace them with meaningful data. You will be warned if this has occurred during import. RUNNING NUMERICAL PROCEDURES When one of the numerical procedure options (A-F on the main menu) are chosen, you will be asked for the name of the input data file. The program will automatically add your default extension if none is specified. So, if your datafile is named "STUDY1.MVS" and your default extension is .MVS, you need only type "STUDY1". If you specify another extension, or have a filename with no extension, the program will recognize those as long as the full name is specified. Pressing the carriage return while the line is blank will return you to the main menu. You may obtain a directory of the default data disk and path by typing a "?". You may then specify a certain file mask, such as "*.MVS" for all files with a .MVS extension or "*.*" for all files. You will then be presented with a list of all files matching that specification. You can now move the cursor around with the cursor keys until you find the one you want, then press "Enter" to select that file. If there are more files than can fit on one screen you can use the PageUp and PageDown keys to move between screens. Pressing ESC will take you back to the filename prompt. After an input file has been selected, you will be presented with an "Analytical Defaults" menu. This allows you to set a number of options concerning the analysis about to be performed. The following is the menu for the principal components analysis procedure: <Graphic placed here in printed manual> The "Transform" and "Transpose Data" options work in a similar way to those in the Data Manipulation procedure. All procedures also allow you to access the "Change Program Defaults" menu, described above, and to save the new defaults. "Quit" will return you to the main menu, and "Run" will initiate the running of the procedure. The "Printed Output" option allows you to specify what ancillary information is to be output as well as the destination of the output. You may select to have the raw or transformed data printed or other intermediary results such as the similarity matrix in the eigenanalysis procedures. It is useful in initial analyses to see the original data to ensure they have been read MVSP Ver. 2.1 -- Users Manual Page 24 correctly, and it can be informative to peruse the intermediate results. In the eigenanalysis procedures, you can also choose to have the results graphed or to have the original data matrix sorted by first axis scores and printed. This can be useful for seeing patterns in the original data. <Graphic placed here in printed manual> The "Save to WKS File" option allows you to specify that the resulting eigenvalues and scores from the ordination will be separately saved to a Lotus-format file. This allows you to use a spreadsheet or graphics program to produce plots of the scores or to do further numerical analyses with the scores. Note that all the results are stored in the file, with the eigenvalues and percentages at the top, followed by the components loadings or species scores, and ending with the component scores or samples scores. If you wish to transfer just one block of scores to your other programs you may need to edit the spreadsheet file and delete the unneeded rows. Alternatively, some programs that import Lotus files allow you to specify in what row and column the results start. The "Output Destination" menu allows you to choose whether to send output to the printer or a file and whether to also show the results on the screen. I often find it useful to send the results to a file so that they can be input to other programs. For instance, if you have a publication quality graphics program available, you can edit the output file, deleting the extra text so that only the loadings and scores are left, and then import these coordinates into the graphics program for plotting, thus saving you from having to retype them. You may first get a hard copy of the results by sending the file to the printer with the DOS command COPY filename PRN or PRINT filename If you have specified that the output should be sent to a file, you will be prompted for the name of the output file when you run the analysis. If you enter a blank carriage return, this output file will default to the input file name plus the default output file extension you have specified. The output file for an analysis of STUDY1.MVS will default to STUDY1.OUT if your default output extension is *.OUT. Principal components analysis: This procedure performs a R-mode principal components analysis. The component loadings are scaled to unity, so that the sum of squares of an eigenvector equals 1, and the component scores are scaled so that the sum of squares equals the eigenvalue. Q-mode PCA will generally have the opposite scaling. Note that many packages, such as SPSS and SYSTAT, perform Q-mode PCA, and thus their eigenvectors will be scaled to the eigenvalue, rather than unity. For details on the computation and assumptions of the technique, see Orloci (1978), Gauch (1982), Pielou (1984), Manley MVSP Ver. 2.1 -- Users Manual Page 25 (1986), and Jolliffe (1986). Orloci and Jolliffe give detailed mathematical discussion of PCA, while Gauch, Pielou and Manley give very clear and understandable discussions of the basis of the technique and its use and assumptions. In the R-mode analysis, similarity coefficients are calculated for the descriptors (or variables), which are the rows of the matrix and component scores are calculated for the objects (or cases), which are the columns of the matrix. STANDARDIZATION AND CENTRING - The Analytical Defaults menu has two options that affect how the PCA is calculated. You may choose to standardize the similarity matrix before eigenanalysis (thus creating a correlation rather than a covariance matrix), and you may use either a centred or uncentred data matrix. Generally a centred covariance matrix is used, but if different units of measurement are used in the data matrix, these will need to be standardized, and thus a correlation matrix should be used. Standardization may also be desired in ecological studies to reduce the effects of dominant species, so that rarer species play a greater role in the resulting configuration. An uncentred data matrix is called for when there is appreciable between-axes heterogeneity. This means that different clusters of points are associated with different axes, and have little projection on other axes. This often occurs when different groups of samples have completely different sets of common species, with little overlap. See Noy-Meir (1973) and Pielou (1984) for more on this phenomenon. MINIMUM EIGENVALUE - You may also specify the minimum eigenvalue for which components are printed out. The possible options are to have all components printed, only those above a certain eigenvalue that you supply, or to base the minimum eigenvalue on one of two rules. Kaiser's rule states that the minimum eigenvalue should be the average of all eigenvalues (or 1 if the correlation matrix is used). This is often considered a good rule of thumb for determining whether a component is interpretable (Legendre & Legendre, 1983). Jolliffe (1986) proposed a modification of this rule in which the minimum eigenvalue is 0.7 times the average eigenvalue. This will usually give one or more extra components over Kaiser's rule. ACCURACY - The accuracy and speed of the eigenanalysis can be controlled by using the "Accuracy of Solution" option. Eigenanalysis in MVSP is performed using the cyclic Jacobi method, which is an iterative procedure that makes repeated passes through the matrix improving the accuracy of the solution. The iterations stop when a certain level of accuracy, which is supplied by the user, is reached. Greater accuracy in the solution means that more passes must be made through the matrix, therefore the program takes longer to run. The accuracy level that you supply to MVSP usually turns out to be roughly equal to the number of correct significant digits in the loadings and scores of the most important components (those greater than 10% of the total variance), so that a level of 1.0 x 10-6 means that these should have roughly six significant digits. You can experiment with different levels to determine the trade-offs MVSP Ver. 2.1 -- Users Manual Page 26 between speed and accuracy. RUNNING THE ANALYSIS - Choosing "Run" will initiate the analysis. Status messages will be listed to the screen during the analysis to let you know how things are proceeding. When it is done, the eigenvalues and their percentage of the total variation will be printed along with the component coefficients (or eigenvectors), then the component scores for each principal component will be calculated and printed. If you have chosen to have the results graphed and have provided a set number of axes to plot through the Graphics Options menu, then these plots will be produced automatically. If the "Plots Per Analysis" option on the Graphics Options menu has been set to "Ask", you will first be prompted to enter the number. Entering a zero will bypass the plotting procedure. See the Graphics Option section above for more details about plotting in MVSP. Principal coordinates analysis: Principal coordinates analysis (PCO) is a generalized form of PCA. Whereas PCA implicitly uses either a covariance or correlation matrix, PCO allows you to input any matrix of metric values. PCO may be used with any of the distances calculated by MVSP except for the squared Euclidean distance. Of the similarity measures only Gower's is metric. PCO is calculated as a Q-mode eigenanalysis, therefore it only gives the eigenvectors, not scores. Note that a PCO of Euclidean distances will give the same results as a Q-mode PCA. Many of the options available for PCA are not applicable to PCO. There is one new option: MATRIX INPUT - A matrix of distance measures must first be calculated using the "Distances and Similarities" procedure (see below). This matrix is then read by the PCO procedure and the eigenvalues and eigenvectors are calculated. A number of different input formats are available, including various forms of half matrices and full matrices. This defaults to the same form specified in the "Matrix Output" option of the "Distances and Similarities" procedure. Correspondence analysis: The correspondence analysis (or reciprocal averaging) procedure performs several varieties of correspondence analysis (Pielou, 1984; see also Hill, 1973, Gauch, 1982, Greenacre, 1984), including detrended correspondence analysis (DCA; Hill & Gauch, 1980). Correspondence analysis in general is well suited for working with count or presence/absence data, whereas PCA is geared more towards measurement data on a continuous scale (although PCA can also be performed on count and binary data; Jolliffe, 1986). DCA was developed by Hill and Gauch (1980) in order to correct two flaws in most ordination techniques. The "arch effect" or "horseshoe effect" is a common feature of most ordinations. This is manifested by the points on the ordination plot being arranged along an arch on the first two axes, rather than a straight line MVSP Ver. 2.1 -- Users Manual Page 27 as expected if the first axis represents a gradient. This is a artifact of the data reduction process that occurs in ordination and represents a mathematical relationship between the first two axes, which are supposed to be independent. As a result of this arch, the second flaw occurs in which the points at either end of the first axis are closer together than those in the middle. These flaws also occur in subsequent axes. DCA was designed to remove this arch from the ordination diagram. It does this by dividing the first axis into a number of segments, then adjusting the scores of the points on the second axis so that the mean score within each segment is the same. Thus it is like cutting the plot into a number of vertical strips and moving each up and down until the points are in a straight line. The scores are also adjusted along the first axis so that they are more evenly spread. This method can often give more interpretable results, but it can also introduce distortion of its own. It is always a good idea to try both regular and detrended correspondence analysis on a data set and compare the results. Detrended correspondence analysis assumes that the actual data being analysed are abundances of a set of variables (taxa in an ecological study) in a set of samples. Presence/absence data may also be used (entered as 0 and 1), but the none of the data may be negative. It is also assumed that the samples come from a gradient in which different variables (taxa) characterize different parts of the gradient. Although it is most commonly used in ecology, this method may also be used in other fields where these assumptions hold, such as archaeology or market research. Many of the options in CA/DCA are similar to those in the PCA procedure. There are several new ones: ALGORITHM - MVSP normally uses the cyclic Jacobi method of calculating ordinations. This method calculates the scores for all axes simultaneously. However, the detrending process cannot be performed with this algorithm, since each axis must be detrended against the final scores of the previous axis. Thus an alternative algorithm can be used in which the solution for each axis is calculated separately. This is done using the reciprocal averaging method described by Hill (1973). The two algorithms are referred to as "Cyclic Jacobi" and "Reciprocal Averaging" respectively. Reciprocal averaging must be used if detrending is desired. You may also want to use the algorithm for non-detrended analyses as well. The algorithm only extracts the first four axes and is usually much faster than the eigenanalysis by the cyclic Jacobi algorithm, which must extract all axes. This is most pronounced with large data sets. However, you often need to see more than the first four axes, particularly if the first four do not account for much of the total variability in the data set. Also, in cases where two or more of the axes have similar eigenvalues the reciprocal averaging method may not give accurate results. MVSP Ver. 2.1 -- Users Manual Page 28 If this happens a warning message will be displayed. The actual scores produced using the two algorithms will differ, because the scaling is different, but the actual configuration on a plot will be the same. The scores produced by the reciprocal averaging method will be scaled to the standard deviation of the species abundance along the gradient represented by the axis. If we assume species abundance along a gradient is normally distributed, then a species will appear, rise to its highest abundance, and disappear in about 4 standard deviation units (sd). Thus if the ordination axis is relatively short (less than 3-4 sd units) then the species turnover along the gradient will be low, whereas long axes (say 12 sd units) will probably have completely different sets of species at either end. Following Hills' original DECORANA program, the sd units are multiplied by 100, so a distance of 400 along the axis represents 4 sd units. The scaling of the axes produced by the eigenanalysis algorithm will be related to the original species abundances, unless the option is chosen to scale them to percentages. DETRENDING - This option invokes the detrending procedure. It can only be used with the reciprocal averaging algorithm and the setting of the algorithm option will be changed when this option is chosen. WEIGHTING - When using the Jacobi algorithm, the analysis can be run with a weighting of either the rare or the common species. See Orloci (1978, pp. 152-168) for details of these methods of weighting. Also, the scores can be adjusted to percentages. The data file should have species as the rows and samples as the columns, as in the PCA procedure. DOWNWEIGHT RARE SPECIES - MVSP follows Hill's DECORANA program in allowing the rare species to be downweighted before the analysis. This is only available when the reciprocal averaging algorithm is used. It can be useful if you want most weight to be given to the common species, but you still want to see how the rarer taxa are affected. Those taxa that occur in fewer than 1/5 the number of samples that the most common taxon occurs in will be downweighted. The amount that the species is downweighted is related to its frequency of occurrence. SEGMENTS FOR DETRENDING - This option sets the number of segments the axis should be divided into for the detrending process. The default value, 26, should be adequate for most analyses, but if the detrending does not seem to be as effective as it could be a larger number can be tried. RESCALING CYCLES - When detrending is in force, the axes can also be rescaled so that the points at the end are not closer together than those in the middle. This rescaling is done several times and this option allows you to vary the number of times. It is generally not advisable to change this from the default of 4, however, as further rescaling may reduce the effectiveness of the ordination. Rescaling may be bypassed by entering 0 for this option. MVSP Ver. 2.1 -- Users Manual Page 29 Distances and similarities: This procedure calculates a variety of distance and similarity measures. The distances are calculated between the columns of the data matrix. An option to transpose the data matrix is included, to allow analysis of the rows without requiring re- entry of the data. There are numerous publications that discuss different type of measures. I have relied on the following in implementing the formulae used in this procedure: Prentice (1980), Sneath & Sokal (1973), Pielou (1984), Greig-Smith (1983), Gordon (1981), and Everitt (1980). You may refer to these for details about the measures provided in MVSP. MATRIX OUTPUT - This procedure is set up to allow easy input of the resulting symmetric matrices into the cluster analysis and PCO procedures. If you choose to input the distance matrix into these, a copy of it, along with the appropriate header information, will be put into a file. This matrix file can then be used as input to the other analyses. When the procedure is run, another filename must be specified for this matrix file. This filename defaults to the symmetric matrix default extension. You may use the matrix output option to specify the type of matrix (e.g. upper or lower half matrix, diagonal present or absent). COEFFICIENT - There are presently eighteen measures available. These, and their formulae, are listed below. In these formulae, i and j represent two columns of the data matrix, k represents the rows, and therefore X would be the datum in the kth row of ik column i. Following the name of each measure is the marker placed in the output file created by the "Distances and Similarities" procedure (see section on "Data file format"). This marker identifies the coefficient that was used to calculate the matrix. It is checked by the cluster analysis procedure when the minimum variance strategy is used. Minimum variance clustering can only be performed on squared Euclidean distances, so this marker allows the program to ensure that the correct distance is being used. Euclidean distance (EUCLID): 2 ½ Ed = (S (X - X ) ) ij k ik jk Squared Euclidean distance (SEUCLID): 2 SEd = S (X - X ) ij k ik jk Standardized Euclidean distance (STEUCLID): 2 ½ StEd = (S (X - X /sd ) ) ij k ik jk k MVSP Ver. 2.1 -- Users Manual Page 30 where: sd = standard deviation of all the elements of k k Cosine theta (or normalized Euclidean) distance (COSINE): 2 ½ CTd = (Sk((X /ss ) - (X /ss )) ) ij k ik i jk j 2 ½ where: ss = (S (X ) ) x x xk Manhattan metric distance (MANHAT): MMd = S |X - X | ij k ik jk Canberra metric distance (CANBER): CMd = S (|X - X | / (X + X )) ij k ik jk ik jk Chord distance (CHORD): ½ ½ 2 ½ Cd = (S (X - X ) ) ij k ik jk Chi-square distance (formula X2 of Prentice, 1980) (CHISQR): 2 ½ CSd = (S ((X - X ) /S X )) ij k ik jk l lk Average distance (AVERAGE): 2 ½ Ad = ((S (X - X )) /n) ij k ik jk where: n = number of elements in each variable (i or j) Mean character difference distance (MEANCHAR): ½ MCDd = ((S |X - X |)/n) ij k ik jk where: n = number of elements in each variable (i or j) MVSP Ver. 2.1 -- Users Manual Page 31 Pearson product moment correlation coefficient (PEARS): _ _ S (X - X ) (X - X ) k ik i jk j PCC = ---------------------------------------- ij _ 2 ½ _ 2 ½ (S (X - X ) ) (S (X - X ) ) k ik i k jk j Spearman rank order correlation coefficient (SPEAR): 2 6 S (R - R ) k ik jk SCC = 1 - ----------------------- ij 3 n - n where: R = rank order of element in variable Percent similarity coefficient (PERCENT): S min(X , X ) k ik jk PSc = 200 -------------------- ij S (X + X ) k ik jk where: min = minimum of two values Gower general similarity coefficient (GOWER): S (w s ) k ijk ijk GGSc = ------------------ ij Skw ijk |x - x | ik jk where: si = 1 - ------------- for quantitative data, jk range(k) = 1 for matches of binary or multistate data, = 0 for all mismatches w = 0 for negative matches of binary data, ijk = 1 in all other situations For this coefficient, the data type for each variable (row) must be declared. This is done through the first two characters of the data labels: those beginning with "B_" are taken to be binary, those with "M_" multistate, anything else is considered quantitative. For instance a variable indicating the presence or absence of sepals in a flower would have the label B_SEPAL, that indicating the colour of the petals (one of four possible) would be named M_COLOUR, and petal length would be recorded in the row MVSP Ver. 2.1 -- Users Manual Page 32 with the label LENGTH. The following binary (presence/absence) coefficients are based on a table of frequency of matches and mis-matches of the presence or absence of a single variable. The binary data should be entered into the data matrix as 0 (zero) and 1 (one). Any number that is not zero is also treated as a one, indicating presence. Sample j Presence Absence ┌───────────────────────┐ Sample i Presence │ a b │ │ │ Absence │ c d │ └───────────────────────┘ Sorensen's coefficient (SOREN): Sc = 2a / (2a + b + c) ij Jaccard's coefficient (JACCA): Jc = a / (a + b + c) ij Simple matching coefficient (MATCH): SMc = (a + d) / (a + b + c + d) ij Yule coefficient (YULE): Yc = (ad - bc) / (ad + bc) ij Cluster analysis: This procedure performs hierarchical agglomerative cluster analysis of an input matrix of distance or similarity measures. Seven forms of clustering are presently available: the four average linkage procedures (unweighted pair group, unweighted centroid, weighted pair group, and weighted centroid [or median]); nearest and farthest linkage, and minimum variance. The actual algorithm is based on Lance & William's (1966) generalized clustering procedure. For clear and concise explanations of the theory and practice behind cluster analysis, see Sneath and Sokal (1973), Everritt (1980), Grieg-Smith (1983), and Pielou (1984). MATRIX INPUT - A number of different input formats are available, including various forms of half matrices and full matrices. This defaults to the same form specified in the Matrix Output option of the Distances and Similarities procedure. TREE DESCRIPTION FILE - When the clustering is finished, you can have a description of the resulting dendrogram output to a file. MVSP Ver. 2.1 -- Users Manual Page 33 This description is in the form of labels enclosed in parentheses and commas, which delimit the clusters. Also after each label and closing bracket is the distance between that object or group and the next in the hierarchy. An example of this description is: ((LENGTH:125.71,WIDTH:125.71):170.50,HEIGHT:296.21); This would correspond to a dendrogram of the form: <Graphic placed here in printed manual> Christopher Meacham has written a program called PLOTGRAM which can be used to plot dendrograms and cladograms described in the above format. For PLOTGRAM to properly read the description produced by MVSP, the following two options must be set: DIAGRAMTYPE Y TIPS Y Plotgram is no longer included with MVSP, since MVSP can now automatically plot dendrograms (see below). If you wish to get a copy of Plotgram that will work with MVSP-generated files, one may be obtained, along with the Pascal source code, at cost from Kovach Computing Services. We cannot, however, provide support for the program or endeavour to add new types of printers. TREE ORDER FILE - MVSP can also produce a file in which the data labels are listed in the order they occupy in the dendrogram. This type of file can be read by the program SORTDATA, which accompanies the registered version of MVSP. This program is useful for producing combination dendrograms in which two dendrograms, one for the columns of the data matrix and another for the rows, are plotted together with the original data matrix in between in graphic form (see example below; also Kovach, 1988a,b; 1989 and Duigan & Kovach, 1991). This allows you to see how the data are affecting the clustering. See the Utilities section below for details about SORTDATA. <Graphic placed here in printed manual> RANDOMIZE INPUT ORDER - There have recently been some suggestions (Bayer, 1985; Lespérance, 1990) that input order of the data matrix can affect the results of clustering with certain types of data sets. Changing the input order can not only change the order of objects in the dendrogram but more importantly can also cause some objects to be joined to different clusters. This is particularly possible when two or more pairs of objects have identical similarities either at the beginning or after recalculation during the clustering procedure. Normally the clustering procedure scans through the similarity matrix sequentially looking for the next pair of objects to fuse. Choosing the "Randomize" option causes the matrix to be scanned in a random order which changes each time the procedure is run. In order to check for chaotic behaviour in clustering, try running two or three clusterings of the same data matrix with MVSP Ver. 2.1 -- Users Manual Page 34 this option set, then compare the dendrograms. Note that changes in the actual order of objects in the dendrogram are to be expected; a cluster diagram can be viewed as a 'mobile' hanging from a ceiling in which the different clusters can rotate around. It is the branching order in the dendrogram that is important and this is what should be compared when testing for chaotic behaviour. CONSTRAINED CLUSTERING - As stated above, normally the actual order of objects in the dendrogram is not important. However, if you are working with sequential data (such as in stratigraphic geological studies), a special constrained form of cluster analysis can be used (Birks & Gordon, 1985; Kovach, in press). When this option is chosen, clustering proceeds as usual except that the objects to be fused are constrained to be adjacent in the data matrix. Therefore, the dendrogram that is produced will have the objects in the same order as the input matrix. This type of constraint can often cause distortion in the dendrogram. In particular, reversals often occur where the distance (and therefore the branching level) between two objects is greater than that between the cluster of those two and the next object in the hierarchy. In sequences where there is a lot of variability, this can cause the dendrogram to be almost uninterpretable. OUTPUT - The output of the procedure consists of a report of the status of the clustering procedure as each new object is added to the cluster. The average similarity or distance of the two groups that have just been joined is printed out, along with a listing of the two groups and the number of objects in the newly fused group. If a single object is added to another cluster, the label for that object (or a numerical label corresponding to its position in the data matrix) is printed out. If a whole group is added, the node at which that group was last added to is printed out. For instance, a report such as: NUMBER OF OBJECTS NODE GROUP 1 GROUP 2 DISSIMILARITY IN FUSED GROUP 1 LENGTH WIDTH 125.706 2 2 NODE 1 HEIGHT 296.206 3 would correspond to the dendrogram shown previously. The results of the cluster analyses are also automatically displayed as dendrograms. These may either be text-based or drawn in graphics mode, depending on the setting of the "Scatterplot/Dendrogram Type" option of the "Graphics Output" menu (under the "Program Defaults" menu). The text-based dendrograms will automatically be directed to the same file or printer that the results are going to. Graphics dendrograms may be printed by pressing "P" when the dendrogram is on the screen. See the "Printer Setup" options on page 13 for more information. Diversity indices: This procedure computes three diversity indices commonly used in ecology, Simpson's, Shannon's, and Brillouin's. See Pielou MVSP Ver. 2.1 -- Users Manual Page 35 (1969) for a discussion of the use and derivation of these indices. The input data file should be set up with species as rows and samples as columns. The diversity, then, is calculated for each column. Be forewarned that the Brillouin index calculates factorials of the species abundances, and if any of your abundances are high, this could take a very long time! For abundances greater than 1000 the factorial is estimated using Stirling's formula, which is much faster and, at these abundances, provides a close approximation. The LOG BASE option allows you to specify whether to use logarithms to the base 10, 2, or e. The output consists not only of the diversity index, but also the number of species and the evenness, which is defined as the diversity divided by the log of the number of species. UTILITIES Sortdata: In many of my analyses I perform clusterings of both the samples and species. I've found it very valuable to present the resulting two diagrams with the original data matrix in between, sorted in the order of the dendrograms. The data can be split into abundance classes, which are represented by different characters, so that the differing abundances can be seen at a glance. In this way the structure revealed by the cluster analyses can be seen directly in the data matrix (see Kovach, 1988a,b; 1989 for some examples). SORTDATA is a utility I've written to help produce these diagrams. It is only included with the registered version of MVSP. To produce one of these combination diagrams, you must first run two cluster analyses of the same data matrix, one with the matrix transposed, the other not. Make sure that the "Tree Order" option is turned on. This will produce two files with the order of the objects in the dendrogram for SORTDATA. Next run SORTDATA with the following parameters: SORTDATA datafile.MVS order1.ORD order2.ORD [output.SRT] where "datafile.MVS" is your original data file used for input to the distance and similarity procedure, "order1.ORD" and "order2.ORD" are the tree order files for analyses of the transposed and non-transposed matrices, and "output.SRT" is the file that will contain the sorted data matrix. If "output.SRT" is missing the output will be put into a file named "datafile" with a .SRT extension. When the program is run, it will first read the original data matrix, determine the lowest and highest data values, and then ask you to define the ranges of four data classes. First you must enter the value below which no symbol is plotted; if your data are counts, this value will be 1. Then enter the cutoff points between the four classes. When you are done the program MVSP Ver. 2.1 -- Users Manual Page 36 will sort the data and translate them to the abundance classes, placing the results in a file. To assemble the resulting diagram, you must first print out the sorted data matrix. You can use your word processor to print it, perhaps adjusting the character font (pica, elite, or condensed) and the line spacing to fit the diagram on one page. Then measure the width and length of the matrix and use MVSP to produce the dendrograms, setting the "Plot Height" and "Plot Width" parameters on the "Printer Setup" menu to the appropriate length (in cm) so that the dendrogram will be the same size as the sorted data matrix. Alternatively you can reduce or enlarge dendrograms you've already plotted with a photocopier or photographically. The whole diagram may then be assembled on a large (A3 or 11"x17") piece of paper. To obtain the registered version of MVSP, with the SORTDATA utility, see page 4, the REGISTER.DOC file, or the "Register" option on the main menu. DISCLAIMER The accuracy of this program has of course been extensively tested against the results of other programs. However, unforeseen errors in computation can and have crept up even in the most sophisticated and widely used statistical packages. You may wish to initially run comparisons with the results of other programs, using your own data set, to ensure that it is working properly with your type of data. Note when running comparisons that there are often many methods of computing the same routine, and results may vary, especially in the more complex eigenanalysis procedures. In principal components analysis, for instance, there are numerous ways of transforming the data before eigenanalysis (see Greig-Smith, 1983, pp. 247ff), and the component loadings can be scaled either to unity (as they are here) or to the variance of that principal component, or in other manners. Also, the eigenanalysis can rotate the cloud of points in different directions, so that signs of the scores are reversed and the actual values different. The configuration of the points will be the same, however. If you do run into any problems with this program, whether they be in the results or abnormalities in the running of the program, please contact me by post or through electronic mail at the addresses given on the cover page. Please give full details of the problem and, if possible, the data set which you were running when the bug cropped up. Please note that no warranty is given for this program. The author (Warren L. Kovach) shall not be legally liable for any damages or lost profits arising from use or misuse of this program. Refer to the "Limited Warranty" section on page 5 for full details. MVSP Ver. 2.1 -- Users Manual Page 37 80x87 SUPPORT If you aren't satisfied with the speed of this program, a faster version that uses the 80x87 math coprocessor is distributed with MVSP Plus, the registered version of MVSP. This coprocessor (which is an optional chip that can be plugged into your computer) greatly speeds up the processing of real number, floating point arithmetic. Often this increase in speed can amount to 10 times! This is particularly noticeable for calculation that use logarithms and trigonometric calculations. The calculation of the Brillouin diversity index, which uses log factorials, for an 84x84 matrix took 9 minutes 14 seconds without a math chip but 2 minutes 41 seconds with one. A PCA, which uses mostly arithmetic operations, of a 45x45 data matrix took one hour with the standard version of the program, but only twenty minutes with the 80x87 version (tests run on a 12 MHz 80286 based Compaq Portable III). Borland Pascal, the compiler used in developing MVSP, has an option for creating programs that take advantage of this processor. The programs compiled using this option will only work on machines that have the 80x87 installed. The installation program will detect whether your computer has a math chip and will install the appropriate version. To obtain the registered version of MVSP, with 80x87 support, see page 4, the REGISTER.DOC file, or the "Register" option on the main menu. PROTECTED MODE VERSION New with MVSP ver. 2.1 is a special protected mode version. It is only provided with MVSP Plus, the registered version of MVSP. This is compiled to run in what is called "protected mode" on the Intel 80286, 80386, 80486, and Pentium microprocessors. The primary advantage in using protected mode, as opposed to the "real mode" which is compatible with the older 8088 and 8086 chips, is that all of the RAM memory in the machine can be directly accessed, whereas real mode programs are limited to 640Kb of RAM memory. So, if you have 16 Mb of RAM in your computer MVSP in protected mode can directly use all of it for storing data while calculating. Under real mode, MVSP would have to dump parts of large arrays onto disk while calculating, which slows down the process tremendously. For example, a cluster analysis of a 400x400 matrix, which needs to store 937kb of data on disk under real mode, took 191 minutes whereas the same analysis under protected mode took only 41 minutes. (test run on a 16MHz 386SX based machine with 5Mb RAM). MVSP Protected Mode requires a computer with a 80286 or higher microprocessor and at least 2 Mb of RAM. If your computer has these capabilities, then the MVSP installation program will give you the option of installing the protected mode version as well. The protected mode version of MVSP was produced using Borland International's Borland Pascal 7.0 compiler. The program runs MVSP Ver. 2.1 -- Users Manual Page 38 under the DPMI (DOS Protected Mode Interface) standard. This requires that a DPMI compatible server is installed. Some 386 memory managers (such as QEMM 386) provide optional DPMI support, as does Microsoft Windows 3.x in enhanced mode. If you do not have a DPMI server installed, then Borland's server will automatically be loaded and used. You need do nothing except type the command MVSPPROT at the DOS prompt. For MVSP Protected Mode to work properly, two files must be present in the same directory as the MVSPPROT.EXE program. These are DPMI16BI.OVL and RTM.EXE. These are files produced by Borland International. They will be automatically installed along with MVSP Protected Mode. Accompanying them is a documentation file called DPMIUSER.DOC that explains more about running protected mode programs, as well as discussing some other utility programs that are included with MVSP Protected Mode. If you have any problems running the protected mode version refer to this documentation file first. To obtain the registered version of MVSP, with the protected mode version, see page 4, the REGISTER.DOC file, or the "Register" option on the main menu. APPENDICES The printed manual with the registered version of MVSP has several large appendices at this point, describing the example data files, listing error messages and their meanings, explaining the format of the configuration file, and giving information on efficient memory management. These have been omitted from the shareware version for brevity. MVSP Ver. 2.1 -- Users Manual Page 39 REFERENCES Aitchison, J., 1986. The Statistical Analysis of Compositional Data. Chapman and Hall, London. Bayer, U., 1985. Lecture notes in earth sciences. 2 Pattern recognition problems in geology and palaeontology. Springer- Verlag. Birks, H.J.B. & Gordon, A.D., 1985. Numerical Methods in Quaternary Pollen Analysis. Academic Press, London. Cooke, D., Craven, A.H., & Clarke, G.M., 1982. Basic Statistical Computing. Edward Arnold (Publishers) Ltd., London. Davis, J.C., 1986. Statistics and Data Analysis in Geology, 2nd Edition. John Wiley & Sons, New York. Duigan, C. A. & Kovach, W.L., 1991. A study of the distribution and ecology of littoral freshwater chydorid (Crustacea, Cladocera) communities in Ireland using multivariate analyses. Journal of Biogeography, 18:267-280. Everitt, B., 1980. Cluster Analysis. 2nd Edition. Gower Publishing Co., Hampshire, 136 pp. Gauch, H.G. Jr., 1982. Multivariate Analysis in Community Ecology. Cambridge University Press, New York. Gordon, A.D., 1981. Classification. Chapman and Hall, London. Greenacre, M.J., 1984. Theory and applications of correspondence analysis. Academic Press, London. Greig-Smith, P., 1983. Quantitative Plant Ecology. University of California Press, Berkeley. Hill, M.O., 1973. Reciprocal averaging: An eigenvector method of ordination. Journal of Ecology, 61:237-249. Hill, M.O., & Gauch, H.G. Jr., 1980. Detrended correspondence analysis: An improved ordination technique. Vegetatio, 42:47-58. Jolicoeur, P., & Mosimann, J.E., 1960. Size and shape variation in the Painted Turtle. A principal component analysis. Growth, 24:339-354. Jolliffe, I.T., 1986. Principal Components Analysis. Springer- Verlag, New York. Kent, M., & Coker, P., 1992. Vegetation description and analysis. A practical approach. Belhaven Press, London. Kovach, W. L., 1988a. Multivariate methods of analyzing paleoecological data. In: W. A. DiMichele & S. L. Wing (eds.), Methods and applications of plant paleoecology. The Paleontological Society Special Publication, 3:72-104. MVSP Ver. 2.1 -- Users Manual Page 40 Kovach, W.L., 1988b. Quantitative palaeoecology of megaspores and other dispersed plant remains from the Cenomanian of Kansas, USA. Cretaceous Research, 9:265-283. Kovach, W.L., 1989. Comparisons of multivariate analytical techniques for use in pre-Quaternary plant paleoecology. Review of Palaeobotany and Palynology, 60:255-282. Kovach, W.L., in press. Multivariate techniques for biostratigraphical correlation. Journal of the Geological Society, London. Kovach, W.L. & Batten, D.J., in press. Association of palynomorphs and palynodebris with depositional environments: quantitative approaches. In: Traverse, A. (ed.), Sedimentation of Organic Particles. Cambridge University Press. Lance, G.N. & Williams, W.T., 1966. A generalized sorting strategy for computer classifications. Nature, 212:218. Legendre, L., & Legendre, P., 1983. Numerical Ecology. Elsevier Scientific Publishing Company, New York. Lespérance, P.J., 1990. Cluster analysis of previously described communities from the Ludlow of the Welsh Borderland. Palaeontology, 33:209-224. Manly, B.F.J., 1986. Multivariate statistical methods. A primer. Chapman & Hall, London. Noy-Meir, I., 1973. Data transformations in ecological ordination. I. Some advantages of non-centering. Journal of Ecology, 61:329-341. Orloci, L., 1978. Multivariate Analysis in Vegetation Research, 2nd edition. W. Junk, Boston. Pielou, E.C., 1969. An Introduction to Mathematical Ecology. Wiley-Interscience, New York. Pielou, E.C., 1984. The Interpretation of Ecological Data. Wiley-Interscience, New York. Prentice, I.C., 1980. Multidimensional scaling as a research tool in Quaternary palynology: A review of theory and methods. Review of Palaeobotany & Palynology, 31:71-104. Sneath, D.H., & Sokal, R.R., 1973. Numerical Taxonomy. W.H. Freeman & Co., San Francisco. Sokal, R.R. & Rohlf, F.J., 1981. Biometry. 2nd Edition. W.H. Freeman & Co., San Fransisco. ter Braak, C.J.F., 1986. Canonical correspondence analysis: A new eigenvector technique for multivariate direct gradient analysis. Ecology, 67:1167-1179. MVSP Ver. 2.1 -- Users Manual Page 41 OTHER PRODUCTS FROM KOVACH COMPUTING SERVICES Wa-Tor for Windows - A population ecology simulation program for Microsoft Windows. Pit hungry sharks against tasty fish in an endless ocean. You can set the initial numbers of fish and sharks, their birth rates, and the shark starvation time. The population fluctuations may be watched on the graphic display and via three types of data plots. Based on an idea from Scientific American. Price: £10. Coming soon: Oriana - Orientation analysis for the IBM-PC and compatibles. This program calculates circular statistics on orientation data measured in degrees. Calculates the circular mean, standard deviation, and polar coordinates of a sample and compares pairs of samples using Watson's F-test and the Chi-square test. Price: TBA. Coming soon: Rarefact - Rarefaction analysis for the IBM-PC and compatibles. This program estimates ecological diversity while taking into account differing samples sizes. It uses Hurlbert & Simberloff's method of estimating the expected number of taxa in a random sample of a certain size, allowing samples to be compared on a equal basis. Also calculates the variance around the estimate. Price: TBA. Consulting Do you have a data analysis problem but don't have the time to do it properly or would rather have an expert do it? Then contact Kovach Computing Services. We provide data analysis services using all appropriate methods in any field. Services include publication quality graphics and full reports describing the results and providing comments on their robustness. ---- Kovach Computing Services 85 Nant-y-Felin Pentraeth, Anglesey LL75 8UY Wales U.K.