Share Gallery 1

home *** CD-ROM | disk | FTP | other *** search

/ Share Gallery 1 / share_gal_1.zip / share_gal_1 / GR / GR015A.ZIP / KS.DOC < prev next >

Wrap

Text File | 1991-11-15 | 103KB | 2,599 lines

TexaSoft's USING KWIKSTAT Reference Guide, Condensed Version (C)Copyright 1991 Alan C. Elliott All rights reserved. No part of this manual may be reproduced without prior permission. For information, address TexaSoft, P.O. Box 1169, Cedar Hill, Texas 75104. CIS:70721,3145 No patent liability is assumed with respect to the use of the information contained herein. While every precaution has been taken in the preparation of this publication, the publisher assumes no responsibility for errors or omissions. Neither is any liability assumed for damages resulting from the use of the information herein. This shareware copy of the program is made available so you can "try it before you but it." When you register, you will receive the latest version of the program, license to use the program on a regular basis, a 288 page printed manual, 3 months of support, a newsletter and more. ----------------------------------------------- For more information, print these files Registration form - KSORDER.TXT Site license - SITELICE.DOC Updated information - LATENEWS.DOC Detailed Installation instructions - KSINSTAL.DOC ------------------------------------------------- CONDENSED TABLE OF CONTENTS --------------------------- Part I: An OverView of KWIKSTAT Part II: Using the KWIKSTAT Database Part III: A Review of Statistical Concepts Part IV: Performing A Statistical Analysis o Descriptive statistics o t-tests and analysis of variance o Non-parametric comparative procedures o Regression analysis o Crosstabulations and Chi-Square o Life tables and survival analysis Part V: Using KWIKSTAT Utilities o Export data from a database to an ASCII file o Produce a printed report o Import 1-2-3 type files o Create and edit images for pictograph procedure Appendices: Error Codes/ Problem Form/ Ballot --------------- Please Register 1 KWIKSTAT 3 PART I - AN OVERVIEW OF KWIKSTAT ------------------------------------------------- KWIKSTAT is for people who need to summarize, analyze or interpret numerical information. It will help you decide what kind of analysis is appropriate, read the data you already have on your computer (from a variety of file types) or from the keyboard, perform the analysis and offer interpretation of the results. Unlike older programs such as SPSS or SAS, you do not have to be a professional statistician or programmer to beneficially use KWIKSTAT. REQUIREMENTS:100% compatible computer, including the IBM PS/2 computers, 384K RAM, CGA, EGA, VGA or Hercules compatible monitor. Many printers are supported. INSTALLATION Detailed installation procedures for KWIKSTAT are in the file KINSTALL.DOC. For quick installation on a hard disk, place the KWIKSTAT disk (1) in the A: drive and enter: A:INSTALL Follow the instructions on the screen. USING THE KWIKSTAT MENU The main KWIKSTAT menu uses a pull-down menu interface. When you begin the KWIKSTAT program with the KS command from the DOS prompt (after it has been installed), and after the Copyright screen, you will see the main KWIKSTAT "DATA" menu. (If the ANALYZE menu appears instead of the DATA menu, press the left arrow key once, and the DATA menu will appear.) The top line of the menu is a menu bar. This bar contains the options "Data", "Analyze" and "Helps". These are the three main options for KWIKSTAT. Using the right and left arrow keys, you can move bewteen the options. To select options from an extended menu (pulled-down), use the up and down arrow keys on the cursor pad to highlight the option you desire, then press the Enter key. Or, to select on option from a pull-down menu, press the first letter of the option name. To exit KWIKSTAT choose the Quit - Exit from the DATA menu or press the Esc key. --------------- Please Register 2 KWIKSTAT 3 USING THE ANALYZE MENU The KWIKSTAT Analyze menu allows you to choose which analysis module to run. See the section titled "TUTORIAL: Try this Example". USING THE KWIKSTAT HELP SYSTEM The "HELPS" pull-down menu contains the following choices: o HELP ON USING THE PROGRAM OPTION - general help o DECIDE WHAT ANALYSIS TO USE OPTION - what analysis to use o ABOUT KWIKSTAT - copyright and order information o GO TO DOS, RETURN WITH EXIT (SHELL) - temporarily go to DOS prompt o CHANGE SETUP - select default directory, printer, monitor o SET MONITOR COLOR - select monitor colors -------------------------------------------------------------------- DO THIS TUTORIAL EVEN IF YOU DON'T READ ANY MORE OF THE MANUAL. -------------------------------------------------------------------- This tutorial will give you a feeling for how to use KWIKSTAT. It assumes you are using KWIKSTAT on a hard disk. To begin KWIKSTAT, you must first be in the \KWIKSTAT directory on your hard disk. Use the CD (Change Directory) command from the DOS prompt to change to the \KWIKSTAT directory by using the command: CD\KWIKSTAT From the \KWIKSTAT directory, begin KWIKSTAT with the command: KS After the copyright information, the Data pull-down menu will appear. (If the Analyze menu appears, press the left arrow key once to open the Data menu.) ACCESSING THE KWIKSTAT HELP SCREEN To examine the KWIKSTAT HELP menu, press the F1 function key. (This help screen is available from any menu.) The HELP menu lists major topics, and the screen number. You can think about the HELP procedure as a book, with screens instead of pages. To look at a particular topic, enter the screen number you desire. For example, to look at screen 7, type 7 and press Enter. KWIKSTAT displays screen 7. Once you have displayed screen 7, to move to screen number 8, press Enter. To go back to the menu, type the "M" key. To exit the HELP module, press the Enter key from the main Help menu or the Esc key from a help screen. Press Enter now. This takes you back to the KWIKSTAT Data pull-down menu. Every module has the help --------------- Please Register 3 KWIKSTAT 3 screens available. The KWIKSTAT "Decision" help screen is available from the Helps pull-down menu. To look at this help menu, use the right arrow key to move to the Helps pull-down menu. Then use the down arrow key to highlight "Decide what analysis to use" and press Enter. EXAMPLE OF DESCRIPTIVE STATISTICS Entering data from the keyboard is explained later in Part II, "Using the KWIKSTAT database". This example will use the database named EXAMPLE on disk. To open this database, use your arrow keys to move to the Data pull-down menu. Select "Open a Database". A "Pick" menu will appear of available database names. Use the up and down arrow keys to highlight EXAMPLE and press Enter. (If the EXAMPLE database does not appear on the list of databases, you may not have installed the program correctly. Review the installation instructions.) Once the database is opened, a notice at the bottom of the screen tells you that the database is open. Press the "L" key to choose the List the Contents option. This will list the contents of EXAMPLE database to the screen. Press Enter several times to list the entire database to the screen. When the list is finished, you will return to the Data pull-down menu. Use the right arrow key to move to the Analyze pull-down menu. Choose the Descriptive Statistics and Graphs option from the Analyze menu by highlighting it and press Enter. KWIKSTAT now switches to the Descriptive module (which may take a few seconds). From the Descriptive Statistics menu, press the letter B (or highlight the "B" option and press Enter) to choose "Detailed statistics on a single variable." The program now displays the variables available for analysis from the database. Choose variable number 2 (AGE) by typing a 2, then press Enter. Before the statistics for this variable are displayed, two options are presented. First, you are prompted you with the question: Specify Confidence Interval level (.5 to .99) (Default is .95) For this example, PRESS ENTER TO ACCEPT THE DEFAULT. Default for percentiles is Tukey 5 Number Summary Specify your own percentiles to calculate (y/N)? When a Yes/No question appears on the screen, notice that the Y or the N will be uppercase (in this case it is (y/N). This means that if you press Enter without entering a Y or an N, the uppercase option is the default (No). For this example, to choose No to the question, JUST PRESS ENTER. --------------- Please Register 4 KWIKSTAT 3 The program will display a screen of descriptive statistics, and a box plot of the data. Notice that this screen is different from previous screens. The information on this screen is displayed in graphics mode (if you have a graphics monitor). Normally, information on the screen is in "text" mode. When graphs are displayed on the screen, the program must use a graphics screen mode. This graph appears in black and white, although some graphs will appear in color. On graphic screens, a menu will appear at the bottom of the screen for a few seconds, then disappear. This allows you to capture or print the screen without the menu appearing on your printout. To bring the menu back, press the spacebar once. The menu options are still available even when the menu is not visible. The menus differ according to your setup and particular options available for the graphic display, but most graphic menus will include the following options: Esc:Exit R:Replot P:Print Press Esc to end the display, press R to replot (choose other display options) and P to print the graphic screen to the printer. For example, if you want a printed copy of this graphics screen, MAKE SURE YOUR PRINTER IS TURNED ON, and is ON LINE, and HAS PAPER. Then, press "P" (for Print). IF THE SCREEN DOES NOT PRINT PROPERLY: You may not have your printer graphics command properly implemented - review the installation procedures and technical considerations in the appendix and the file PRINTERS.DOC. To return to the main Descriptives menu, press Esc. To end this module and return to the main KWIKSTAT menu, press Esc. To end KWIKSTAT from the main menu, press Esc again and answer Y to the prompt "End KWIKSTAT." Procedures are explained more fully later in the manual. However, you may find that you will be able to use most of the KWIKSTAT features without any further aid from the manual. Remember, you have three sources of information if you need help. (1) the regular help menu (F1), (2) the help procedure that will assist you in choosing the right statistical analysis to use (main menu, Helps) and (3) the manual. IF SOMETHING GOES WRONG If an error code appears and cannot resolve the problem, please fill out the Problem Report Form and send it in right away, so that errors in the program can be eliminated. (For fastest response, fax it to 214-291-3400 or send a Compuserve E-Mail message to 70721,3145.) If you have a suggestion for how to improve KWIKSTAT, fill out the USER'S BALLOT. Thanks. --------------- Please Register 5 KWIKSTAT 3 PART II - USING THE KWIKSTAT DATABASE --------------------------------------------------------- The DATA pull-down menu is used to manage your data. From this menu you enter data, change data, create new data fields from existing ones, and perform other data maintenance tasks. Once your data is in the KWIKSTAT (dBASE-type) database, you can access the data from any of the other KWIKSTAT modules. HOW DATA IS STORED IN KWIKSTAT A KWIKSTAT database uses the same file format as the dBASE III and dBASE IV programs. Therefore, data already stored in a dBASE III or dBASE IV file may be read directly into all the KWIKSTAT programs. The only exception to this is that KWIKSTAT does not read dBASE MEMO fields. Therefore, if your data in dBASE contains memo fields, you may have to create a subset of your database before using it in KWIKSTAT. Data from other programs can also be used in KWIKSTAT. Refer to the section called "Entering Data into a Database." The following information describes how to use the DATA pull-down menu. OPENING AN EXISTING DATABASE The OPEN A DATABASE TO USE option on the DATA menu allows you to access information in a dBASE file. Use this option to choose the database that you will be analyzing. When you choose the OPEN option on the DATA menu, a pick list of databases currently in the default directory will be displayed. To select a database, use the up and down arrow keys to highlight a database name, then press Enter. If the database you want to use is not in the current (default) directory, you can temporarily change the default directory by pressing the F2 function key. DESIGNING AND CREATING A DATABASE The CREATE A NEW DATABASE OPTION on the DATA menu is used to create a new database. The structure, or layout, of a database must be described before you enter your data. --------------- Please Register 6 KWIKSTAT 3 NAME YOUR DATABASE The database name must be standard DOS file name. DO NOT include an extension to the name. Once you have named the database, you will define the fields -- names of the places where the data will be stored. Each variable (or field) description requires the following information o A field name o A field type (character or numeric) o A width o Number of decimals (if field is numeric) DEFINE THE FIELDS IN YOUR DATABASE When you first enter the definition mode, the cursor will be in the FIELD NAME area. Enter a name (such as AGE), and press Enter. In the TYPE area, you only need to enter the first character of the type (N, or C), then press Enter. If your choice is NUMERIC, press ENTER when your cursor moves to this area (the default). WIDTH is the number of characters reserved for the entry. Decimal is the number of decimal places (only relevant for numbers). Note that the number of decimal places must be at least one less than the width. For example, if a number has the format ###.##, the width is 6 (count the decimal point), and the number of decimal places is 2. Once a complete field description is entered, a next blank field description will appear, ready for entry. To end the creation process, type Control-END (^END). The End key is on the numeric pad. As long as you have not ended the procedure, you may use the cursor keys to back up, and make any corrections. If you mess up, end the procedure with Esc and begin again. If you want to enter data now, answer "Y" to the question Enter Records Now (y/N) Otherwise answer "N". You can always enter the data later, or add to data already in a database. SPECIFICATION FOR DATABASE FIELDS 1. The FIELDNAME: 1 to 10 characters, MUST begin with a character (A to Z). 2. The TYPE may be: CHARACTER - May contain any character. NUMERIC - Must contain numbers only. Examples:1.00, -4.32, 6, 10000. 3. The WIDTH of the field: Choose a width so that the maximum number of characters will fit into the field. --------------- Please Register 7 KWIKSTAT 3 4. DECIMALS:Decimals are only valid for numeric fields. This specifies to KWIKSTAT how many decimals to retain in the field. LIMITATIONS Maximum 250 fields. Maximum width of a cell is 60 characters (15 for numbers). Memo fields are not supported. Date and Logical fields are recognized, but they cannot be used in transformations or subsetting. ENTERING DATA INTO THE DATABASE When you choose the Data entry option, you will be asked to specify entry from the keyboard or from a file (ASCII file). For most small data sets, you will probably enter data from the keyboard. If another program supports ASCII or dBASE files, you will be able to enter data from that program in to KWIKSTAT. Also, KWIKSTAT contains a translation facility to import data from 1-2-3 type files and comma delimited files. (See Part V, Using KWIKSTAT Utilities.) The following information describes how to enter data from the keyboard, from an ASCII file. ENTERING DATA FROM THE KEYBOARD If you choose KEYBOARD data entry, an entry screen will appear containing the fields you created in the CREATE option. Entering data from the keyboard is similar to the way you enter field descriptions when creating a new database. The entry screen displays the name of each field followed by a highlighted entry area where you will type in the contents of the field. Note: The word FIELD refers to the variable that contains information, such as GROUP or AGE. The word RECORD refers to the entire collection of FIELDS for one entry -- for example, the GROUP, AGE, TIME1, etc. are for one person. While you are entering information into a record, you can use the up and down arrow to move among fields to make corrections. Once you enter information in the last field of a record, KWIKSTAT assumes you have finished entering data for that record, and goes to the next record. If a record contains too many fields to fit on one screen, KWIKSTAT will display the first 21 fields on the screen. When you have entered information into those fields, the next 21 fields will appear on the screen. This will continue until information has been entered into all fields for the record. If you need to go back to a previously entered record to edit, pressing the PgUp key will automatically place you into edit mode. --------------- Please Register 8 KWIKSTAT 3 IMPORTANT: Once you have finished entering information, you can use either Esc or ^END (Ctrl-End) to end the entry process. (Just as in the dBASE program.) Be careful, since THESE TWO COMMANDS MEAN DIFFERENT THINGS. When you press Esc to end, it means, "DO NOT SAVE the current record." When you use ^END to end it means "SAVE the current record." Therefore, if you are entering data, and come to the last record, and KWIKSTAT is displaying a blank record beyond the actual data, use the ESC to end. If you are on your last record, and it contains information you want to keep, use the ^END to end entry. If you accidentally end up with a blank record in your database, use the Delete and Pack procedure to get rid of it. (See Deleting and Packing.) ENTERING DATA FROM AN ASCII FILE KWIKSTAT can read data from ASCII text files. (See also LATENEWS.DOC for information on entering data from a comma delimited file.) These kinds of files are usually supported by most word processing programs (such as WordPerfect DOS Text Mode). Data must be in the form of column data, like this... A 22 3.3 WF A 33 4.2 BF B 27 3.3 WM : Etc. Notice that each column of data is in fixed fields. It does not matter that there is no space between the last two fields (Race and Sex) since the program will pick off the information from the column and does not require that there be spaces between the columns. Use the instructions below to prepare the KWIKSTAT (dBASE) database structure to be used to read in ASCII data. The steps to enter ASCII data into KWIKSTAT are: STEP 1. Use the CREATE option to create a database structure to match the columns in the ASCII file. The field widths MUST match the width of the columns of data on file. If there are spaces between columns of data, make widths wide enough to account for those spaces. The following data is from the file EX.DAT on disk: --------------- Please Register 9 KWIKSTAT 3 A 12 22.3 25.3 28.2 30.6 5 A 11 22.8 27.5 33.3 35.8 5 B 12 22.8 30.0 32.8 31.0 4 A 12 18.5 26.0 29.0 27.9 5 : etc : B 12 22.4 27.2 31.8 35.6 4 Try your hand at doing this example by creating a database named EX with the following structure: FIELD NAME TYPE WIDTH DECIMALS GROUP C 2 AGE N 4 0 TIME1 N 5 1 TIME2 N 5 1 TIME3 N 5 1 TIME4 N 5 1 STATUS N 2 Notice that even though the first column has data 1 column wide, this structure uses a width of 2 for GROUP. Even though the age only uses 2 columns, the structure calls for AGE to have a width of 4. These widths are enter this way to take care of the blank spaces between the columns. Create the database called EX with the specifications listed above, then go to the next step. STEP 2: Once you have defined the database to match the ASCII input file, choose the Data entry option from the DATA menu, and choose to read data from a file. You will be prompted to enter the name of the file containing the ASCII data, then the data will be read into the database file. STEP 3: To verify that the data was read properly, use List option to examine the database. USING DBASE TYPE FILES If the program you are using supports dBASE files, all you have to do to copy the file to the KWIKSTAT data directory. --------------- Please Register 10 KWIKSTAT 3 EDITING RECORDS When you choose the Edit a record option, you will be asked to specify the record number to edit. Editing is similar to entering data. Use the up and down arrow keys to move from field to field within a record. Use the PgUp and PgDn keys to move forward or backwards in the database one record at a time. When you are finished editing record, use the ^END command to exit from the edit mode. DELETING RECORDS If you want to delete an entire record within a database, use the edit procedure to display the record to delete. While a record is displayed, pressing ^U marks the record for deletion. A **DEL** will appear on the screen (upper right corner) of a "deleted" record. You can use PgUp and PgDn to move within the database and mark as many records as you choose. If you accidentally mark a record for delete, pressing ^U a second time will cancel the mark, and the **DEL** will disappear from the screen. PACKING THE DATABASE The records marked for delete are not actually deleted at this point. However, they will be ignored in most analyses. Once you have marked one or more record for delete, you may want to permanently get rid of them. To erase all records marked for delete, choose the Pack procedure from the FILES menu. This procedure erases all "deleted" records from the database. MODIFYING AND DISPLAYING THE STRUCTURE The Modify or Display database structure option on the DATA menu allows you to display the structure of your database, and allows you to change characteristics about the database structure. When you choose to display the structure, a list of all field names, their types, widths and decimals (if any) are listed. SETTING MISSING VALUES CODES Sometimes in the collection of data there are values that are lost or cannot be gathered. These are called "missing values". When such values occur, it is important for the program to know that the values are missing so that statistical calculations may take this into account. Missing values are usually designated as an impossible value. For example, the missing values designated for the variable AGE may be -9, since it is impossible for the variable AGE to have the value -9. When the program is asked to calculate the mean of age, for example, it will ignore those records where AGE is -9 in that calculation if -9 has been specified as the missing value code. In most KWIKSTAT procedures, there is a casewise deletion of the record from calculation whenever a missing value is encountered. Once you --------------- Please Register 11 KWIKSTAT 3 designate a missing value code for a variable, it is up to you to make sure that this code gets placed into your database in the proper records and fields. For example, if you have designated -9 as the missing value code for AGE, you must make sure that in your database a -9 appears in the field AGE if that data is missing or unknown. A standard dBASE file does not have a way to designate missing values, but KWIKSTAT allows a way for you to designate these values in this program. The Indicate missing value codes option on the DATA menu is used to set up these values. When this option is selected, the program will display an entry screen that is similar to a data entry screen. You may enter one missing value for each field name. The missing value must obey the definition of the field in terms of length and type. Once missing values are entered, they are stored on disk in a file named filename.MV, where "filename" is the name of the designated database. If a new variable is created using the transformation procedure, its missing value is appended to the missing value file. You may change or correct the missing values for a database at any time by calling up this option. If missing values are already designated for the database, they will be displayed on the entry screen, and you may edit them or accept them as they are. IMPORTANT NOTE: If missing values are NOT used, and there is a blank numeric variable in a calculation, it will be treated like the value 0 (zero), so it is important to use missing values if your data contains such entries. Otherwise, the statistical calculations will be in error!! MAKING A VARIABLE BY TRANSFORMATION You may create a new numeric variable in a database by choosing the Transformation option. For example, if you wanted a new variable to be the ratio of WEIGHT to HEIGHT, you could name a new variable RATIO, and use the transformation WEIGHT/HEIGHT as the expression to create the new variable. When you request the TRANSFORMATION procedure, you will o Define a name for the new field o Define a width for the new variable. o Define the number of decimals, if any. o Define a missing value code. If none is selected, it is assumed to be 0 (zero). CAREFUL ATTENTION must be paid to the definition to assure that the calculated numbers will fit into the field width specifications. If the calculated number is too large to fit into the field, it will be given the missing value code. If an illegal calculation is attempted, such as a division by 0, the result will be missing. If a calculation --------------- Please Register 12 KWIKSTAT 3 includes a missing value, the result will be a missing values. TRANSFORMATIONS SUPPORTED KWIKSTAT supports standard mathematical operation and functions, as described below: Mathematical operators: Add (+), Subtract (-), Multiply (*), divide (/) and exponenation(^). Following are a few examples of correct expressions: NEW = AGE/HEIGHT NEW = SUM(AGE,WEIGHT,HEIGHT,SCORE) NEW = PI * (SCORE ^ 2) Notice that SUM is a function. KWIKSTAT supports over 20 functions, including ABS, ACOS, ASIN, ATAN, ATAN2, CSC, COS, COT, EXP, INT, LN, LOG, MAX, MIN, MOD, PI, RAND, RECNO, RECODE, ROUND, SEC, SIN, SQRT, SUM and TAN. The RECODE function is defined as follows: NEW = RECODE(SCORE,1,0,10,15) means NEW = 1 if SCORE is between 10 and 15, else NEW=0) SUBSETTING THE DATABASE The Subset database option on the DATA menu allows you to create a new database from an old database. The new database can be a subset of the old one, using a conditional criteria for outputting information from the old database to the new one. For example, suppose you have a database with a field GROUP with values 1, 2, 3, 4 and 5. You want to create a database that does NOT include Group 5. After choosing Subset database from the DATA menu, you are asked for the name of the new database. For example, your new database might be named NO5.DBF. You are asked for the field name to be used in the selection criteria. In this case, you would choose the field named GROUP. Next you must enter the selection relationship. It will be described as a numerical expression. The conditional operators you may use are: = > < >= <= <> = and the logical operator ".NOT.". The program will prompt you with SELECT IF GROUP --------------- Please Register 13 KWIKSTAT 3 and you must finish the selection criteria. For example: SELECT IF GROUP .NOT. = 5 (Select records for which the variable GROUP is not equal to 5.) You may use all of the variables in the database in the of the expression, and you may use the functions described in the Transformation option. For example, other selections might be SELECT IF GROUP = 4 SELECT IF GROUP > STATUS SELECT IF GROUP < WEIGHT*HEIGHT SELECT IF TIME1 = TIME2*1.96 LISTING THE DATABASE TO THE SCREEN The LIST option on the DATA menu allows you to look at the information in your database. ------------------------------------------------------------------- DO THIS TUTORIAL TO LEARN ABOUT CREATING A DATABASE ------------------------------------------------------------------- TUTORIAL:YOUR TURN - GIVE IT A TRY Suppose you are given data from an experiment. The data are from a sample of 15 hogs (randomized to four groups) that have been given one of four feeds. The measured response for this experiment is weight gain. The data are summarized below: FEED1 FEED2 FEED3 FEED4 60.8 78.7 92.6 86.9 67.0 77.7 84.1 82.2 54.6 76.3 90.5 83.7 61.7 79.8 90.3 This will be analyzed as a One-Way Analysis of Variance. For that procedure, you must have a grouping variable (FEED) and a response variable (WEIGHT GAIN). Therefore, the database to be created for this data will have two variables. You can call them FEED and WEIGHT. Before entering the database, you must first create a new database: STEP 1. Begin the KWIKSTAT program from the DOS prompt with the KS command. From the DATA menu, choose the option Create a new database. STEP 2. The database creation screen will appear. On this screen, define the two fields. Field one will be named FEED, it will be of NUMERIC type with a width of 2, with no decimals. Field two will be named WEIGHT, it will be of NUMERIC type with a with of 5 and 1 --------------- Please Register 14 KWIKSTAT 3 decimal place. (NOTE:Widths could actually be 1 and 4 instead of 2 and 5, but the one extra space gives you some space for unexpected large values and makes data entry easier.) Enter the two field specifications on the creation screen: FIELD 1: FEED, NUMERIC, WIDTH 2, NO DECIMALS FIELD 2: WEIGHT, NUMERIC, WIDTH 5, 1 DECIMAL PLACE End the database creation process by entering a ^END (Hold the CTRL key down with one finger, and press the End key with another finger.) STEP 3. You will be asked if you want to enter the data now. Answer Y for yes. The data entry screen will appear. On the first screen (the first record, you will enter the following values: FEED N: 1 <--- You enter the 1 WEIGHT N:60.8 <--- You enter the 60.8 The next record will be FEED=1, WEIGHT=67.0, and so on. The following table lists the values you will enter into the database: FEED WEIGHT 1 60.8 1 67.0 1 54.6 1 61.7 2 78.7 2 77.7 2 76.3 2 79.8 3 92.6 3 84.1 3 90.5 4 86.9 4 82.2 4 83.7 4 90.3 It is important to understand how KWIKSTAT reads this data from the database. The FEED variable places the WEIGHT values in one of four groups. Thus, KWIKSTAT "knows" that the number 77.7 belongs to group (FEED) 2, and so on. STEP 4. Once you have entered the data for the 15 hogs into the database, the screen should be displaying the data entry screen for record 16, which does NOT contain any data. To end the entry procedure and NOT save the empty record 16, press the Esc key. Your data is now saved in the database. STEP 5. Always verify that your data is correctly entered by choosing --------------- Please Register 15 KWIKSTAT 3 the List option from the DATA menu. This lists the values to the screen. If everything looks okay, you are ready to analyze your information. The various procedures in KWIKSTAT expect the data to be stored in a particular way to perform a statistical test. Refer to the sections on each procedure for examples of how to design your database to match the expectations of that procedure. PART III ----------------------------------------------------------- The Data Generations and Simulations module contains several simulations that can be used to demonstrate statistical concepts. PART IV - PERFORMING A STATISTICAL ANALYSIS ------------------------------------------------------------ This section of the KWIKSTAT manual describes the statistical analysis procedures available in the basic KWIKSTAT program. USING DESCRIPTIVE STATISTICS AND GRAPHS The Descriptive Statistics and Graphs module allows you to examine summary statistics of the data in a database. Procedures and graphics in this module include: DETAILED STATISTICS ON A SINGLE VARIABLE This option calculates the mean, standard deviation, median, standard error of the mean, minimum, maximum, sum, and variance of a set of data. KWIKSTAT also calculates five percentiles and computes a two-sided confidence interval about the mean. You have the opportunity to specify the five percentiles to be calculated, as well as the level of confidence of the confidence interval. EXAMPLE 4.1: DESCRIPTIVE STATISTICS ON A SINGLE VARIABLE Suppose you have the following data on seven persons, and you want to know the average age of persons in the group being weighed. Data for Age/Weight Example Person Age Weight 1 23 140.0 2 21 133.5 3 34 200.0 4 33 150.0 5 40 296.5 6 28 167.0 7 25 175.5 --------------- Please Register 16 KWIKSTAT 3 CREATING THE DATABASE The database will include seven records (one for each of the seven persons) and two fields ( for the two variables, age and weight). That is, in each record two pieces of information about that person (age and weight) will be entered. FIELD NAME TYPE WIDTH DECIMALS 1 AGE Numeric 3 0 2 WEIGHT Numeric 6 1 You will be asked if you want to enter records now. Answer Yes by typing Y and pressing Enter. A data entry screen will appear where you will enter the data. The data you will enter in the first record is 23 (press Enter) and 140.0 (press Enter). Enter the data for the seven records. Refer to the example in the database tutorial for entering the data. After entering the data, examine the AGELBS database by choosing the List (display) the contents of a database option from the Data menu. The data in the database should look like this: RECNO AGE WEIGHT 1 23 140.0 2 21 133.5 3 34 200.0 4 33 150.0 5 40 296.5 6 28 167.0 7 25 175.5 If your data do not look like this, use the Edit a record option to correct errors. PERFORMING THE ANALYSIS Once you have entered the data into a database, choose the DESCRIPTIVE STATISTICS AND GRAPHS module from the Analyze pull-down menu. The Descriptive Statistics and Graphs menu will appear. Select DETAILED STATISTICS ON A SINGLE VARIABLE. You will be prompted to choose the field name of the variable on which you wish to calculate summary statistics. In this case Enter 1, which chooses AGE. You will then be asked to specify the level of confidence for the confidence interval. If you want a 95% C.I., simply press Enter for the default setting. If you want, say, a 99% interval, type .99. Next, you will be asked if you want to specify percentiles other than the Tukey 5 number summary (0, 25th, 50th, 75th, 100th). If you answer --------------- Please Register 17 KWIKSTAT 3 yes, you will be prompted for the five percentiles you want. KWIKSTAT will perform the calculations and display the results. Figure 4.1 shows the results of the summary statistics procedure on the AGE variable using default settings for percentiles and a 95% C.I. SUMMARY STATISTICS ON A NUMBER OF VARIABLES This option is similar to the above Descriptive statistics on a single variable, but in this option several variables can be summarized using descriptive statistics (sample size, mean, standard deviation, minimum, maximum, and standard error of the mean). If you have a grouping variable in your database, you may request output of summary statistics by group. You are also given the opportunity to print results to the printer, or to output results to a file. APPROXIMATE P-VALUE DETERMINATION This option calculates p-values for entered values of four test statistics: normal (z), student's t, F, chi-square. If you designate the statistic being used, degrees of freedom and the calculated value of the test statistic, KWIKSTAT will tell you the p-value associated with that test statistic. PRODUCING A HISTOGRAM This procedure produces a histogram from values read from a database. A histogram can be helpful in determining if the distribution of a continuous variable is approximated by a normal distribution. If the histogram has a peak toward the center, with both tails diminishing, the data could be considered to be approximated by a normal distribution. PRODUCING AN XY-PLOT (SCATTERPLOT) This option enables you to produce a scatterplot of two variables. A scatterplot is simply a plot of all the data values plotted one variable against the other. Such a plot is helpful in determining if two variables are related, and if the relationship is linear (a straight line), curvilinear, or something else. TIME SERIES PLOT This option enables you to produce a time-series plot for one variable. This plot is useful in examining data that is time related, such as profit by month, etc. The X axis is assumed to be "time". The data values must be entered into records in chronological order the observations occurred, i.e., the first record must contain the results of the first observation (first time period), etc. Use UNEMP variable in the LONGLEY database to see an example of graphing an observation over time. --------------- Please Register 18 KWIKSTAT 3 USING T-TESTS AND ANOVA PROCEDURES T-tests and Analysis of Variance (ANOVA) procedures are used to test hypotheses about population means using data obtained through random sampling of those populations. PARAMETRIC INDEPENDENT GROUP ANALYSIS Independent group analysis is appropriate when observations are taken from groups in which subjects in one group do not appear in another group. In this module, a t-test is performed when there are two groups, and an ANOVA is performed when there are three to ten groups being compared. When performing a t-test or ANOVA on two or more independent groups, you are testing the hypotheses: Ho: The difference in the means of the groups is zero. Ha: The difference in the means of the groups is not zero. For a two-sample t-test, two t-statistics are calculated, one for the case in which the variances of the two samples are equal and the other for use in the case of unequal variances. KWIKSTAT performs a test of the hypothesis that the variances are equal, that is, a test to determine if the variances are equal, and reports a p-value. If this p-value is small (e.g., less than 0.05), the hypothesis of equal variances is rejected and you use the t-statistic for unequal variances. If the p-value is large, use the t-statistic for equal variances. EXAMPLE 4.7: TWO SAMPLE T-TEST (INDEPENDENT GROUPS) The data used here are heights of 13 plants grown using two different fertilizers. Suppose you want to know if there is a difference in the average heights of plants in the two treatment groups. Data for independent group t-test (fertilizer study) Present Fertilizer Newer Fertilizer 46.2 cm 51.3 cm 55.6 52.4 53.3 54.6 44.8 52.2 55.4 64.3 56.0 55.0 48.9 In order to enter this data into a database, you must assign group numbers (or letters) such as Present = 1 and Newer = 2, or you could use P and N (if the variable is of the character type). --------------- Please Register 19 KWIKSTAT 3 Since the observations are independent, the database will include thirteen records (one for each plant) and two fields (one for the response and one for the group indicator. FIELD NAME TYPE WIDTH DECIMALS 1 GROUP Numeric 5 0 2 HEIGHT Numeric 5 1 You can choose any field names up to ten characters. You may want to use FERTILIZER instead of GROUP, for example. You will be asked if you want to enter records now. Answer Yes by typing Y and pressing Enter. A data entry screen will appear where you will enter the data. The data you will enter in the first record is 1 (press Enter) and 46.2 (press Enter). Enter the data for the thirteen records. For each record of a "Present Fertilizer" observation, enter "1" for the GROUP variable. For the "Newer" observations enter a "2" for the GROUP variable. The second record is a 1 and 55.6. They eighth record is 2 and 51.3. From the List option the data in the database should look like this: RECNO GROUP HEIGHT 1 1 46.2 2 1 55.6 3 1 53.3 4 1 44.8 5 1 55.4 6 1 56.0 7 1 48.9 8 2 51.3 9 2 52.4 10 2 54.6 11 2 52.2 12 2 64.3 13 2 55.0 Notice that the GROUP field is 1 if the data are from the Present Fertilizer group and 2 if the data are from the Newer Fertilizer group. --------------- Please Register 20 KWIKSTAT 3 PERFORMING THE ANALYSIS Once you have entered the data into a database, select the T-TESTS AND ANALYSIS OF VARIANCE option from the Analyze menu. Then select the COMPARE INDEPENDENT GROUPS option. You will be prompted to choose the field name of the grouping variable, which in this case is simply GROUP. Enter 1, which chooses GROUP. Next, you will be asked for the data field. Enter 2, which chooses HEIGHT, the response variable. KWIKSTAT will now perform the calculations and display the results on the screen, as illustrated in Figure 4.6. The means for each group (1=Present, 2=Newer) are displayed. A test for equality of variance is also performed to see if the variances of the two groups can be considered equal. This is necessary for deciding which t-statistic and p-value to use for the text on means. A p-value for the equal variances test is displayed. A large p-value (e.g., greater than 0.05) indicates that you can consider the variances to be equal. In this case, p=0.4807, large enough to consider the variances to be equal. If the variances are equal, according to this test, you use the "Equal variances" t-statistic. Otherwise, use the "Unequal variances" result. In this case, the two t-statistics are identical at -1.32. The t-test is performed with 11 degrees of freedom, and the p=value associated with the test is 0.213. A large p-value (greater than the significance level, e.g., 0.05) is usually interpreted to mean that there is no significant difference in the means -- the null hypothesis of equal means is not rejected. That is, there is not enough evidence to conclude that the average height of plants grown with the newer fertilizer is significantly different from the average height of plants grown with the present fertilizer. Type G for a graphical comparison of the two samples. Tukey's five number summaries and box plots will appear. Press Esc to continue and you will be given the option to print a report for this analysis. EXAMPLE 4.8. SINGLE FACTOR ANOVA When more than two independent groups are compared with respect to one variable, one-way or single factor analysis of variance techniques are appropriate. This example uses data for hogs which have been randomly assigned to four groups, with each group being given a different feed. The response is weight gain. --------------- Please Register 21 KWIKSTAT 3 Data for Independent Group ANOVA GP1 GP2 GP3 GP4 60.8 78.7 92.6 86.9 67.0 77.7 84.1 82.2 54.6 76.3 90.5 83.7 61.7 79.8 90.3 The database to analyze this data is similar to the one used for Example 4.7 above, differing only with respect to the number of groups. In fact, this one-way ANOVA is an extension of the t-test when there are three or more groups. See the tutorial in the database section for information on how to create and enter this database. The results of this test are summarized in the p-value. In this case, the small p-value (0.000) means that there is a significant difference between groups. The ANOVA tells you only that there is a difference among the feeds.In order to find out which groups are significantly different from which others, press M to choose (M)ultiple comparison. The Newman-Keuls multiple comparison test will describe which of the means are significantly different from which others (at the 0.05 significance level). Figure 4.8 displays a graphical representation of the Newman-Keuls multiple comparisons test. See the example of Friedman's test for how to interpret the Newman-Keuls chart. Box plots are also available to graphically illustrate the differences between the groups. Type G (for graphical comparison) and press Enter to produce the plots. PARAMETRIC REPEATED MEASURES (PAIRED) ANALYSIS Repeated measures are observations taken on the same or related subjects over time or in differing circumstances. Examples would be weight loss, or reaction to a drug across time. Repeated measures may also be matched subjects. A t-test is performed when there are two groups (two repeated measures), and an analysis of variance is performed if there are three to ten groups. In a database for paired or repeated measures data, each record represents one subject (e.g., person, animal). There must be one field for each repeated measure (each treatment group). For paired data, there are two groups, hence two fields. Thus, in each record, there is a field in which to enter data from each observation (treatment) on that subject. The hypotheses being tested with a paired t-test or a --------------- Please Register 22 KWIKSTAT 3 repeated measures ANOVA is: Ho: There is no difference among means of the groups (repeated measures). Ha: There is a difference among means of the groups. EXAMPLE 4.9: PAIRED T-TEST The data in this example are before and after weights for eight persons on a diet. Notice that in this case, both data values are taken from the SAME entity (person). Data for paired t-test Person Before After 1 162 168 2 170 136 3 184 147 4 164 159 5 172 143 6 176 161 7 159 143 8 170 145 The database will include two fields (BEFORE and AFTER) and eight records, one for each person. Since the observations are paired, not independent, the database reflects this by having each record contain a pair of observations. Each record, that is, each person, is independent of the over seven persons, but within a record, the before and after observations are not independent of each other. FIELD NAME TYPE WIDTH DECIMALS 1 BEFORE Numeric 5 0 2 AFTER Numeric 5 0 As a result of the analysis, the means and standard deviations for each group are displayed, but more importantly, the mean difference between BEFORE and AFTER measurements is given. The statistical procedures are performed on this average difference. A 95% confidence interval for the mean difference is given, as well as a calculated t-statistic and a p-value. These results are interpreted like those of a single sample t-test with null hypothesis: mean=0, and alternative hypothesis: mean <> 0. The calculated t-statistic is 2.37. The p-value associated with the test is 0.008. A small p-value such as this is usually interpreted to indicate rejection of the null hypothesis and leads to the conclusion that the average difference in BEFORE and AFTER weights is not zero, i.e., there is evidence of a significant (at the 0.05 level) change of weight in these eight subjects on average. --------------- Please Register 23 KWIKSTAT 3 EXAMPLE 4.10: ONE-WAY REPEATED MEASURES ANOVA For more than a pair of repeated measures on the same subject, a one-way repeated measures analysis of variance is appropriate. The data in this example are repeated measures of reaction times of five persons after being treated with four drugs in randomized order. One-way repeated measures ANOVA data Person Drug 1 Drug 2 Drug 3 Drug 4 1 31 29 17 35 2 15 17 11 23 3 25 21 19 31 4 35 35 21 45 5 27 27 15 31 Create a database (named e.g., MEDICINE) with the field names, e.g., DRUG1, DRUG2, DRUG3, DRUG4. For the first record, enter the data for the first person 31,29,17,35. The second record will contain 15,17,11,23 and so forth. You will be prompted to choose the fields which you wish to compare. Enter 1,2,3,4. KWIKSTAT will now perform the calculations and display the results on the screen. The results of this ANOVA are summarized in the p-value. In this case, the small p-value (p=0.000) means that there is a statistically significant difference in the mean response times for the four drugs. If you want to determine which of the four drugs are significantly different from which others, press M for Multiple comparison. The Newman-Keuls multiple comparison test will describe which of the means are significantly different from which others (at the 0.05 significance level). INDEPENDENT GROUP TESTS FROM SUMMARY DATA This option allows you to perform a one-way ANOVA or a t-test if you have only the means, standard deviations and group sizes of two to ten groups. Since data are summary, no box plots can be given. --------------- Please Register 24 KWIKSTAT 3 SINGLE SAMPLE ANALYSIS This option allows you to choose a single variable, and test a hypothesis that the mean differs from a hypothesized mean. You must enter the hypothesized population mean. The hypotheses you are testing in this case are: Ho: The mean equals the hypothesized value. Ha: The mean does not equal the hypothesized value. USING NON-PARAMETRIC COMPARATIVE PROCEDURES Non-parametric procedures are appropriate when the assumption of normality cannot be made for a small data set or when a large data set is known to be from a non-normal population. Non-parametric procedures are generally based on ranks rather than actual data values, so these procedures can be useful also when actual data values are not known, but the order or ranks of the data values are known. NON-PARAMETRIC INDEPENDENT GROUP ANALYSIS - MANN-WHITNEY AND KRUSKAL WALLIS TESTS In the Non-Parametric Comparison Tests Module, KWIKSTAT uses the Mann-Whitney procedure if two independent groups are being compared, and the Kruskal Wallis procedure if three or more groups are being compared. The hypotheses being tested are: Ho: There is no difference in the medians of the groups. Ha: There is a difference in the medians of the groups. EXAMPLE 4.15: MANN-WHITNEY NON-PARAMETRIC TEST OF TWO INDEPENDENT GROUPS The data from Example 4.7, are used in this example, the database named FERTILIZ. KWIKSTAT will perform the calculations and display the results, including the Mann-Whitney U statistic, the rank sums, sample sizes and mean ranks of the groups, a z statistic and an approximate p-value. In this case, U=24.00, z=0.421 and p=0.673. The p-value is large so the null hypothesis of no difference in medians between groups is not rejected. There is not sufficient evidence based on this procedure to say that there is a difference between the median heights of plants in the two groups grown using different fertilizers. --------------- Please Register 25 KWIKSTAT 3 KRUSKAL-WALLIS PROCEDURE If more than two independent groups are being compared using non-parametric methods, KWIKSTAT uses the Kruskal Wallis test. NON-PARAMETRIC REPEATED MEASURES ANALYSIS - FRIEDMAN'S TEST When repeated observations are taken on the same subject, and there is interest in comparing the observations for each repeated measure (e.g., each type of treatment), then a repeated measures analysis may be appropriate. If you cannot make the assumption that the data that being observed are normally distributed with equal variances between repeated measures, then a non-parametric analysis is appropriate. One method of performing a non-parametric one-way analysis of variance (ANOVA) with repeated measures (randomized complete block experimental design) is with the Friedman test. (When there are only two groups, this test is equivalent to the sign test.) The hypotheses for the Friedman test are: Ho:There is no difference in mean ranks between repeated measures. Ha:There is a difference in mean ranks between repeated measures. The following data are the same data used in a previous example for a standard repeated measures ANOVA: One Way Repeated Measures ANOVA Data Person Drug 1 Drug 2 Drug 3 Drug 4 1 31 29 17 35 2 15 17 11 23 3 25 21 19 31 4 35 35 21 45 5 27 27 15 31 The data presented here are repeated measures of reaction times of 5 persons after being given 4 drugs in randomized order. For a Friedman test, the analysis is performed by ranking the data within each of the 5 subjects. KWIKSTAT calculates the Friedman's Chi-Square and reports the p-value associated with the test. If the resulting p-value is low (usually less than 0.05), it is appropriate to examine multiple comparisons of the groups (repeated measures). To perform the FRIEDMAN test, choose the REPEATED MEASURES ANALYSIS option from the Non-Parametric Tests menu. You will be prompted to choose the field names that represent the repeated measures you want to compare. In this case, enter 1,2,3,4 --------------- Please Register 26 KWIKSTAT 3 which chooses fields DRUG1, DRUG2, DRUG3 and DRUG4. KWIKSTAT will now perform the calculations and display the results on the screen. For this data set, a Chi-Square value of 14.13 and a small p-value (p=0.00, which means p < 0.005) is reported. The small p-value means that there is a statistically significant difference in the mean ranks of times for the four drugs. Press Enter to see the results of the Newman-Keuls multiple comparison test. This test describes which of the mean ranks are significantly different from the others (at the 0.05 significance level). In this case, the following results are reported from the multiple comparison procedure: Gp Gp Gp Gp 3 2 1 4 Population 1 ---------------- Population 2 ----------------- This table is interpreted in the following way: Any two groups underlined by the same line are considered not different at the 0.05 level of significance. Therefore, the result of this analysis is that the mean rank for DRUG 3 is less than the mean rank for DRUG 4. There are no other statistically significant pairwise differences among the four groups. NON-PARAMETRIC DICHOTOMOUS DATA ANALYSIS - COCHRAN'S Q Cochran's Q procedure is a non-parametric procedure appropriate for use with dichotomous data when the experiment involves repeated measures on blocks. Often the blocks are subjects (people or animals). The response of the subjects to the treatments is dichotomous if it is taken as one of only two possible outcomes, often labeled "success" and "failure", rather than as a measurement. Cochrans's Q is used to test three or more treatments, or groups, and is in fact an extension of McNemar's test for two groups. The hypotheses being tested are: Ho: The proportion of successes is the same for all treatments. Ha: The proportion of successes is not the same for all treatments. --------------- Please Register 27 KWIKSTAT 3 USING REGRESSION & CORRELATION PROCEDURES To examine the linear relationship between variables, correlation and linear regression are used. SIMPLE LINEAR REGRESSION ANALYSIS Simple linear regression is used for predicting a value of a dependent variable using an independent variable. To begin the regression module, choose the Regression and Correlation option from the Analyze menu in the main KWIKSTAT module. When you choose the Simple Linear Regression option, KWIKSTAT will prompt you to give the "independent" and "dependent" variables to be used in the analysis. The "independent" variable is generally that variable that you can choose, regulate or specify (e.g., amount of money spent on advertising) and the "dependent" variable is the one you observe and would possibly like to predict. After the two variables are chosen, KWIKSTAT will present the results of its calculations. The regression equation will be displayed along with other results. If the fit is appropriate, the equation may be used to predict a new value of the dependent variable given the value of the independent variable, within the range of the original data. The Pearson correlation coefficient is a number between -1.0 and 1.0, and tells the strength of the linear relationship between the two variables. A correlation coefficient close to -1 or 1 means that the relationship is strong, and a correlation close to 0 means that a relationship is non-existent or very weak. KWIKSTAT also performs a t-test for significance of the slope of the regression line (Ho: Slope = 0, Ha: Slope <> 0). This test is equivalent to testing whether the population correlation coefficient rho = 0. If the p-value is small (less than the chosen significance level) you can conclude that the slope of the regression line is not zero. That is, the linear relationship is statistically significant. Scatterplots of raw data and plots of residuals from linear fit are optionally available. Plots are helpful in visually examining the relationship between the variables. It is important to verify that the relationship is indeed a straight line. Since regression and correlation are used to relate variables to each other, the database must be structured so that each record contains values for each variable. The records often represent time periods or locations from which an observed value for each variable is available. The fields, then, are the variables and you are asked for a value for each field in each record. --------------- Please Register 28 KWIKSTAT 3 EXAMPLE 4.19: SIMPLE LINEAR REGRESSION ANALYSIS Data for this example of simple linear regression are Homicide Rate and Handgun Licenses Issued per 100,000 population for the years 1961 to 1973 in Detroit (Fisher, 1976, reprinted from Gunst and Mason, 1980). Data for simple linear regression (handgun study) Year Homicide Handguns Rate Registered 1961 8.60 178.15 1962 8.90 156.41 1963 8.52 198.02 1964 8.89 222.10 1965 13.07 301.92 1966 14.57 391.22 1967 21.36 665.56 1968 28.03 1131.21 1969 31.49 837.60 1970 37.39 794.90 1971 46.26 817.74 1972 47.24 583.17 1973 52.33 709.59 Since you want to compare homicide rate with handguns registered, you need a database with only these two sets of numbers, and can exclude year. The data in the database will be from the table above, excluding the year column. The database will include two fields (Homicide Rate and Guns Registered) and thirteen records (one for each year). FIELD NAME TYPE WIDTH DECIMALS 1 HOMICIDES Numeric 6 2 2 HANDGUNS Numeric 8 2 The data you will enter in the first record is 8.60 (press Enter) and 178.15 (press Enter), and so on. PERFORMING THE ANALYSIS Enter the Regression module from the Analyze menu, and choose the Simple Linear Regression option. You will be prompted to enter the INDEPENDENT (X) variable, which in this case is HANDGUNS. Enter 2, which chooses HANDGUNS. Next, you will be asked for the DEPENDENT (Y) variable. Enter 1, which chooses HOMICIDES. KWIKSTAT will now perform the calculations and display the results on the screen. Pearson's correlation coefficient (r) is reported (0.7263) as well --------------- Please Register 29 KWIKSTAT 3 as R2 (R-Square, 0.5275). The linear regression equation given is a mathematical representation of a straight line that passes through a plot of the data, and can be used to predict the dependent variable (HOMICIDES) given a value for the independent variable (HANDGUNS). In this case the linear regression equation is: HOMICIDES = 4.910512 + 3.761144E-02 * HANDGUNS If you want to predict the homicide rate for 300 handguns registered, you would use the equation: HOMICIDES = 4.910512 + 3.761144E-02 * 300 A t-test is performed to test the statistical significance of the linear relationship between the two variables. A low p-value means that the two variables are significantly related. In this case p=0.005, quite small, so the null hypothesis (Slope = 0) is rejected and you conclude that the regression line has a slope significantly different from zero. The program also allows you to view a scatterplot of the data with the fitted line and a plot of the residuals. MULTIPLE REGRESSION Multiple regression is an extension of simple linear regression into several dimensions (several independent variables). In the multiple regression procedure, you must enter a list of the independent variables and a single dependent variable on which you wish to perform the regression analysis. In KWIKSTAT you may use up to 10 independent variables in this option. An analysis of variance is performed to determine the overall significance of the model. If the ANOVA reveals a significant relationship, (that is, if the p-value is small) the model may be a good representation of the sample data. A plot of residuals from the fit is available. You may plot the fit against any of the variables. Look for patterns in the residuals. Patterns other than a horizontal band about zero suggest that the assumptions necessary for regression analysis may be violated. If you are unfamiliar with multiple regression, the Neter and Wasserman book contains an excellent treatment. --------------- Please Register 30 KWIKSTAT 3 EXAMPLE 4.20: MULTIPLE REGRESSION ANALYSIS (LONGLEY DATA) Longley introduced a data set which has often been used in comparing multiple linear regression procedures in the literature. The variables refer to economic factors. This example uses the LONGLEY database on the KWIKSTAT disk. The LONGLEY database consists of 7 fields: DEFLATOR, GNP, UNEMP, ARMED, POP, TIME, and TOTAL. The first six of these will be used as independent variables and the seventh, TOTAL, is the dependent variable (the one to be predicted). Figure 4.15 displays the LONGLEY database. You can get this display by using the List (display) the contents of a database option on the Data main menu. PERFORMING THE ANALYSIS In the Multiple Regression procedure, you will be prompted to enter the INDEPENDENT VARIABLE(S), which in this case are DEFLATOR, GNP, UNEMP, ARMED, POP, TIME. Enter any combination of 1,2,3,4,5,6 to choose the variable(s) you wish to analyze against TOTAL. One way to approach a multiple regression problem is to first include all of the independent variables. After initial analysis (see below) you may decide to eliminate those independent variables found to not be significant. After entering the independent variables, you will be asked for the DEPENDENT VARIABLE. Enter 7, which chooses TOTAL. KWIKSTAT will now perform the calculations and display the results on the screen, as illustrated in Figure 4.16. The table at the top of the screen (in Figure 4.16) tells you the intercept value and the coefficient values for each of the independent variables. These can be used to create an equation for prediction of the dependent variable. In this case, the equation is: TOTAL = -3481930.1065 + DEFLATOR*(15.0161517122) + GNP*(-0.03579443400) + UNEMP*(-2.0199053296) + ARMED*(-1.0332049046) + POP*(-0.05130725587) + TIME*(1828.99249535) The t-value associated with each coefficient tests its significance in the equation. You can use the p-value associated with each coefficient to make a decision about the validity of having that variable in the equation. A low p-value suggests that the dependent variable, TOTAL, is related to the independent variable whose p-value you are examining. In this case, you might question the validity of having DEFLATOR (p=0.8636), GNP (p=0.3132) and POP (p=0.8257) in the equation. In choosing the variables to have in such an equation, you also need --------------- Please Register 31 KWIKSTAT 3 to consider such questions as multicollinearity, heteroscedasticity and parsimony. If you wish to delete some variables from the equation, you can do so by re-running the analysis an leaving out some variables. The Multiple Regression procedure also allows you to plot residuals and to calcuate predicted values using the prediction equation. CORRELATION ANALYSIS The correlation coefficient is a measure of the strength of the linear relationship between two variables. KWIKSTAT allows you to find both Pearson's and Spearman's (rank) correlation coefficients of two variables. It also displays the matrix of correlation coefficients of pairs of variables when there are more than two variables being considered. EXAMPLE 4.22: CORRELATION MATRIX (LONGLEY DATA) This example uses the LONGLEY database on disk. You will be prompted to choose variables from the list of fields that appears. In this case, there are seven fields, and you can choose any combination of them. If you want correlation coefficients of all pairs of the seven variables, type 1,2,3,4,5,6,7 and press Enter. KWIKSTAT will perform the calculations and display the 7 by 7 array shown in Figure 4.17. Only half of the array is displayed since the other half is a mirror image. The diagonal entries are also omitted since they are all one; a variable is always perfectly correlated with itself. Each entry in the array consists of two numbers (three numbers if the information is printed to a printer). The first (upper) is the Pearson's correlation coefficient for the two (row and column) variables of that entry. The second (middle) number, in parentheses, is the p-value of the t-test for Ho: rho = 0 vs. Ha: rho <> 0. In the hard copy printout (if requested), the third (bottom) number, in brackets, is the sample size, or number of paired observations used in the calculations. EXAMPLE 4.23: GRAPHICAL CORRELATION MATRIX (LONGLEY DATA) This example also uses the Longley data. You will be prompted to choose variables from the list of fields that appears. In this case, there are seven fields, and you can choose any combination of them. If you want correlation coefficients of all pairs of the seven variables, type 1,2,3,4,5,6,7 and press Enter. KWIKSTAT will perform the calculations and display the 7 by 7 array of scatterplots. These scatterplots are a visual way of examining the relationships between pairs of variables. --------------- Please Register 32 KWIKSTAT 3 USING FREQUENCY AND CROSSTABULATION PROCEDURES The Crosstabulations, Frequencies, Chi Square module performs analyses on categorical data, that is, data observed in categories, rather than measurement data. Previous examples using measurement data include weights of hogs, weights of people, heights of plants, numbers of handguns and homicides, and dollar amounts. If, rather than taking a measurement, a data observation involves identifying which of a set of categories the observation falls into, you are working with categorical data. Generally, categorical data are entered into a database by using one record for each person or entity on which the observation is made and one field for each characteristic which is divided into categories. For example, to categorize ten people by sex, hair color and eye color, you would need ten records (one per person) and three fields (e.g., SEX, HAIR, EYE). Some of the procedures in this module give you the choice of simply entering totals for each category rather than creating a database and entering the results of each observation. This can save time if totals are known and only totals are needed to perform a test or calculation or to produce a graph. PERFORMING A FREQUENCIES ANALYSIS In the Frequencies, Pictograph, Pie Chart option, KWIKSTAT "counts" the occurrence of each data value for a single variable or field and displays that information in a table. You can also create a bar chart, pictograph and/or pie chart of this information using this option. EXAMPLE 4.24: FREQUENCY TABLE, PICTOGRAPH, BAR AND PIE CHARTS This example uses the EXAMPLE database file on the KWIKSTAT disk. One of the fields (variables) in this database is STATUS referring to socioeconomic status. Suppose you want to know how the total data set is divided up into the five levels of STATUS. You will be prompted to enter one field (variable) to use. Since you want to do a frequency table on STATUS, enter 7. KWIKSTAT will count the data in each of the five categories of STATUS and display the results as a frequency table, shown in Figure 4.19. You are then prompted to press Enter, which takes you to the Frequencies Analysis menu. From this menu you may choose to print your table, go back and do another analysis, or create charts (Select an option by using the up and down arrow keys to highlight the desired option and pressing Enter). --------------- Please Register 33 KWIKSTAT 3 You may choose to display a pie chart or pictograph. Selecting Bar Chart/Pictograph takes you to another menu from which you can select the type of chart you want to produce. BAR1 gives regular size bars and BAR2 gives wide bars on a bar chart. Each of the four options BEETLE, CAT, PC, PERSON gives a pictograph whose symbol is the item listed. (BEETLE is the car, not the insect.) A disappearing menu at the bottom of the screen offers the options to replot (press R), print (press P), or exit the pictograph or bar chart (press Esc). Press Enter to retrieve the bottom menu. PERFORMING A GOODNESS OF FIT ANALYSIS A goodness-of-fit test of a single population is a test to determine if the distribution of observed frequencies in the sample data closely matches the expected number of occurrences under a hypothetical distribution of the population. The hypotheses being tested are Ho: The population distribution follows the hypothesised distribution. Ha: The population does not follow the hypothesised distribution. EXAMPLE 4.25: GOODNESS-OF-FIT ANALYSIS The data for this example come from the text by Zar, 1974, page 46. According to a genetic theory, crossbred pea plants show a 9:3:3:1 ratio of yellow smooth, yellow wrinkled, green smooth, green wrinkled offspring. Out of 250 plants, under the theoretical ratio (distribution) of 9:3:3:1, you would expect about (9/16)x250=140.625 to produce yellow smooth peas, (3/16)x250=46.875 yellow wrinkled, (3/16)x250=46.875 green smooth, (1/16)x250=15.625 green wrinkled. After growing 250 of these pea plants, you observe that 152 have yellow smooth peas, 39 have yellow wrinkled, 53 have green smooth, 6 have green wrinkled. --------------- Please Register 34 KWIKSTAT 3 PERFORMING THE ANALYSIS You will be prompted to enter the number of categories. In this case, type 4 for the four categories of peas (yellow smooth, yellow wrinkled, green smooth, green wrinkled) and press Enter. You will also be asked if you want to enter the expected ratios, or if you will be entering the actual expected values into the table. If you choose to enter ratios, you will enter 9,3,3,1 An empty table will appear with the instructions to enter the observed values for each category. Enter the observed values given above, pressing Enter after each entry. For example, for the first row, enter 152 for observed (Press Enter) enter 39 (Press Enter) and so on. KWIKSTAT will perform the calculations (including filling in the expected values column) and display the results. The calculated chi-square statistic in this case is 8.97 and the p-value is 0.031. PERFORMING A CROSSTABULATION ANALYSIS (CHI-SQUARE) Crosstabulations can be used to perform a chi-square test for independence or a chi-square test for homogeneity. A two-way table is constructed that displays the number of counts for each category. It must be possible to assume that the data observations are independent and that each data value can be counted in one and only one category. It is also assumed that the number of observations is fixed. KWIKSTAT allows you to enter data for a two-way table from the keyboard or from a database. When you choose to enter the two-way table from the keyboard, KWIKSTAT will ask you the size of the table (number of rows and columns). A blank table will be presented on the screen, and you will then be prompted to enter a number in each cell of the table. If you choose to enter the information from a database, KWIKSTAT will prompt you to choose the two variables (fields) from the currently active database that you wish to tabulate. KWIKSTAT will read the information from the database, and construct the table. For instance, in the EXAMPLE database, if you choose to tabulate the variables GROUP and STATUS, KWIKSTAT will form the table on the screen as illustrated in Figure 4.23. (Note that the first variable entered is the row variable.) For a test for independence, a contingency table looks at two categorical variables from a single sample of one population and tests whether the two variables are related in some way, (e.g., are sex and hair color related?) --------------- Please Register 35 KWIKSTAT 3 The hypotheses being tested are Ho: The variables are independent of each other. (There is no association between them). Ha: The variables are not independent of each other. KWIKSTAT reports both the chi-square statistic and the p-value. If the expected value in one or more cells is less than 5, the chi-square test may not be valid. A warning to this effect appears on the screen if appropriate. In the case of a 2 by 2 table, Fisher's Exact Test and the chi-square with Yates' correction are also performed and results displayed. SCREEN LIMITATIONS: The limits to tables being displayed on the screen are 10 columns by 7 rows. If the table is too big for the screen, only the test results are displayed. Tables as large as 15 columns by 100 rows may be printed on a line printer if data are entered from a database. EXAMPLE 4.26: CROSSTABULATION ANALYSIS (2 BY 2) TEST FOR INDEPENDENCE Data for this example are observations of the number of beetles and bugs on the upper and lower sides of leaves (Zar,1974, page 292). 2 BY 2 CONTINGENCY TABLE DATA Beetles Bugs --------------- Upper Leaf 12 7 Lower Leaf 2 8 Since you are given only the totals for each of the four categories, and not the individual data for each leaf, there is no need to create a database. Rather, you can just enter these totals from the keyboard. PERFORMING THE ANALYSIS When you choose the crosstabulations option, you will be asked if you want to enter data from a (D)atabase or (K)eyboard. Type K and press Enter. You will then be prompted to give the size of the table. Enter 2 for rows and 2 for columns. An empty table will appear with the instructions to enter the counts for each category into the appropriate cell. Enter the values given above, pressing Enter after each entry. The calculated chi-square statistic in this case is 4.89 with a p-value of 0.028. The chi-square with Yates correction is 3.31 with a p-value of 0.069 and the Fisher Exact Test (two tail) has a p-value of --------------- Please Register 36 KWIKSTAT 3 0.050. Because one of the cells produces an expected value less than 5, KWIKSTAT gives a warning that the chi-square analysis for this data may not be valid. Given this warning, it is best to rely on the Fisher's Exact Test for making a decision. If you choose Output to Printer or File, each cell in the output will contain five numbers. The top number is the count that you entered. The second number is the calculated expected value used to calculate the chi-square statistic. The third number is the percentage of the TOTAL number of observations that the observed number in that cell represents. the fourth and fifth numbers are percentages of the ROW and COLUMN totals that the observed number in that cell represents. EXAMPLE 4.27: CONTINGENCY TABLE LARGER THAN 2 BY 2 (SEX BY HAIR COLOR) A generalization of the 2 by 2 table is the R by C (Rows by Columns) table. This is an example (Zar, 1984, page 62) of a two by four contingency table involving the variables hair color and sex. The null hypothesis is that there is no relationship between hair color and sex. 2 BY 4 CONTINGENCY TABLE DATA (SEX BY HAIR COLOR) HAIR COLOR Black Brown Blonde Red ----------------------------------------- Male 32 43 16 9 Female 55 65 64 16 KWIKSTAT will perform the calculations and display the results as shown in Figure 4.25. The calculated chi-square statistic in this case is 8.99 with a p-value of 0.03. DRAWING A 3-D BAR CHART KWIKSTAT allows you to draw a 3-dimensional bar chart of data for a contingency table (crosstabulation), and then to focus in on a part of it if desired. Data for the 3-dimensional bar chart must be entered first, either from the keyboard or a database, by using the Crosstabulations, Chi-Square option of the Frequencies and Crosstabulations Choose Analysis Option menu. To get to this menu from the Data main menu, select Analyze at the top of the screen, and then select Crosstabulations, Frequencies, Chi Square. --------------- Please Register 37 KWIKSTAT 3 EXAMPLE 4.28: DRAWING A 3-D BAR CHART (EXAMPLE DATABASE) Check the lower left corner of the screen to see which database is currently in use. If it is the one you want, EXAMPLE, go on to Producing the Chart. If it is not EXAMPLE, retrieve the EXAMPLE database. PRODUCING THE CHART Choose the option D) CROSSTABULATIONS, CHI-SQUARE from the menu and specify to enter data from a database. You will be asked to enter the variables to analyze. Choose the variables GROUP and STATUS by entering 1,7. KWIKSTAT will display the results of the chi-square test in a contingency table. Pressing Enter will bring you to the Crosstabulations Analysis menu. Using the up and down arrow keys, highlight 3-D Bar Chart and press Enter. You will be given a default title for the chart and prompted to enter an alternative if desired. You will be similarly prompted for labels for "row" and "column" axes. Default labels are also given. The disappearing menu at the bottom of the screen gives you the options of (R)eplot, (P)rint, or (D)etail (Press Enter to make it reappear.) Pressing D selects Detail, which allows you to look at a part of the chart in isolation. For example, suppose in this case you want to see a detail of the second category (B) in the "row" field (GROUP) by all categories (1,2,3,4,5) of the "column" field (STATUS). Press D for detail and you will be prompted to specify the detail parameters. When prompted for the first and last row, type 2,2 (that is, begin at row 2 and end at row 2). When prompted for the first and last column, type 1,5 (begin at column 1 and end at column 5). Figure 4.27 shows the display of the detail requested. To exit from the plot press Esc. MCNEMAR'S TEST McNemar's test is appropriate for use with paired, dichotomous (i.e., 0, 1 data) data. This test is sometimes called a test for related samples or a test for the significance of changes. It is useful for comparing paired or related observations in which the response is dichotomous, that is, the response is one of only two possible outcomes. McNemar's test is the 2 by 2 version of Cochran's Q test described in the section on non-parametric tests. The test assumes that any pair of observations is independent of any other pair of observations, although clearly the observations within a pair are not independent of each other. --------------- Please Register 38 KWIKSTAT 3 USING LIFE TABLES AND SURVIVAL ANALYSIS PROCEDURES As the name indicates, this module performs life tables and survival analysis procedures. The data must be in the following form: 1) a TIME variable which contains a time (e.g., minutes, days, years, etc.) in which the subject or component has been observed to be alive (not failed). 2) a CENSOR variable which must take on the values 0 or 1, where 1 means the subject has died (failed), and a 0 means the subject was still alive (not failed) at the last available time period. 3) optionally, a GROUPING variable which may have up to ten values (numeric or character), i.e., the data may be in groups. Once the data are entered into the program, a life table for each group is produced which includes, for each time interval, the number entered, withdrawn, lost, dead, exposed, the proportion dead, proportion surviving, cumulative proportion surviving, hazard and density. A plot is given for the cumulative proportion surviving in the group(s) against time. If more than one group is entered, a Mantel-Haenszel test is performed to test the hypothesis of equal survival patterns for the groups. A small version of the survival plot will appear on the screen, and if you choose to print a report of the session the report will include a larger version of the plot along with other information from the analysis. EXAMPLE 4.31A: LIFE TABLE ANALYSIS The data for this example are in the LIFE database on the KWIKSTAT disk. These data are from Prentice (1973). Open the LIFE database, and begin the Life Table Module from the Analyze menu. Use the up and down cursor keys to highlight Life Tables and Survival Analysis and press Enter, or simply press B. You will be prompted to choose a time variable indicating the amount of survival time observed, and a censor variable indicating which subjects are still alive (censored) at the last time period. In this case, enter 1,2 to choose SURVIVAL as the time variable and CENSOR as the censor variable. You will then be prompted to choose a grouping variable if you wish. If there is no grouping variable or you don't wish to group the data, simply press Enter. In this case, type 3 to choose GROUP --------------- Please Register 39 KWIKSTAT 3 as the grouping variable, and press Enter. KWIKSTAT reports the names and sizes of the groups and then asks you to specify the length of each interval for the table to be produced. You can specify a desired interval length or you can use the default length by simply pressing Enter. KWIKSTAT will perform the calculations and display a table which includes the numbers of subjects entered alive, withdrawn, dead, exposed, the proportion dead, proportion alive, cumulative survival proportion and standard error for the first group. Press Enter and a second table appears, which includes 95% confidence limits on the cumulative survival proportion. A summary of the upper table also appears. KWIKSTAT now creates the two similar tables for the second group. From the tables, you can see that, in the first group, 22 of 37 exposed, or 59.5% died in the first interval (0.0-99.0) and two were withdrawn. In the second group, 12 of 23.5 exposed (51.1%) died and one was withdrawn in the first interval. KWIKSTAT also draws a small graph of the two survival curves, and performs the Mantel-Haenszel comparison of the two curves testing the hypothesis: Ho: The survival curves are the same. Ha: The survival curves are not the same. A chi-square statistic is reported, as well as a p-value. A low p-value is taken to indicate rejection of the null hypothesis. In this example, the Mantel-Haenszel comparison procedure results in a chi-square statistic of 0.7191 and a p-value of 0.397. This p-value is too large to reject the hypothesis of equal curves. This indicates that the two distributions are not significantly different - thus neither treatment is superior in terms of survival distributions. At the end of a survival analysis, you will be asked if you want to print the results to the printer. PART V - USING KWIKSTAT UTILITIES -------------------------------------------------------- The KWIKSTAT UTILITY module contains a number of utilities that do not fit into any of the other modules. --------------- Please Register 40 KWIKSTAT 3 EXPORTING DATA You may output the data from your KWIKSTAT (DBF) file into a standard ASCII TEXT file. (Often called an SDF file - Standard Data Format file.) PRINTING A REPORT You may output a listing of the data in the dataset (or a selected subset of the database) by using the report facility. To use this procedure, choose the Print REPORT to printer or file option from the Utilities menu. IMPORTING DATA FROM 1-2-3 TYPE FILES This option is useful for translating 1-2-3 files into a DBF format that can be used by the KWIKSTAT program. Begin this option by choosing the Convert WKS file to DBF option on the Utilities Menu. This translation facility will translate most versions of WK* files. An example file on disk to translate is TEST.WKS, which contains data in cells A1.H6. The import program will not allow you to specify more than 128 columns to translate into a DBF file. CREATING AND EDITING KWIKSTAT IMAGES The image program allows you to create or edit images to be use by the Pictograph procedure. When you begin the IMAGE module, you will be asked if you want to create a NEW image, or to edit an OLD image. If you choose to enter a NEW image, you will be asked the pixel size. Maximum size for an image is 40 pixels (dots) wide and 30 pixels high. The Pictograph routine will adjust its graph according to the size of the image. In the editor, you may move the cursor around the grid, and select to fill a dot by pressing the numbers 1, 2 or 3. To unfill a dot, place the cursor at the dot and press the space bar or 0. You will see a version of the image in its correct size at the upper right of the screen. Once you have created or edited an image, choose the (S)ave option to write the information to disk. APPENDIX INTERPRETING ERROR CODES: If the program encounters a problem it does not know how to resolve, it will usually display an error message containing an error and reference code. Many times, you can correct this error situation by understanding what caused it. For example, if you were to get an error number 27, you would know that it was caused by your printer sending an "Out of Paper" message to the program. If you are unable to resolve a problem, please write down the steps you took before the error was encountered, and send it in on the Problem Report Form. --------------- Please Register 41 KWIKSTAT 3 ERROR CODES: 5=Illegal function call 57=Device I/O error 6=Overflow 58=File already exists 7=Out of Memory 61=Disk full 9=Subscript out of range 62=Input past end of file 11=Division by zero 63=Bad record number 14=Out of String Space 64=Bad filename 24=Device Timeout 67=Too many files 25=Device fault 68=Device unavailable 27=Out of Paper 70=Permission denied 50=FIELD overflow 71=Disk not ready 51=Internal Error 72=Disk media error 52=Bad filename or number 74=Rename across disks 53=File not found 75=Path/File access error 54=Bad file mode 76=Path not found 55=File already open 81=Invalid filename --------------- Please Register 42 KWIKSTAT 3 Problem Report form: KWIKSTAT Please explain in detail the problem that occurred. If possible, send a print out of the results or Print Screen. KWIKSTAT VERSION YOU ARE USING:________________________ KWIKSTAT MODULE where problem occurred:____________________ YOUR COMPUTER: BRAND/Model_____________________________ MONITOR TYPE:________AMOUNT OF MEMORY:_______________ VERSION OF DOS YOU ARE USING:____________________________ MEMORY RESIDENT PROGRAMS YOU USE:____________________ PROBLEM: Mail to:TexaSoft, P.O. Box 1169, Cedar Hill, Texas 75104. Or fax to 214-291-3400, or send E-Mail to Compuserve 70721,3145. --------------- Please Register 43 KWIKSTAT 3 USER'S BALLOT Indicate your preference for improvements in KWIKSTAT. On a scale of 0 to 10: 0 = Low priority, 10 = High priority Vote Proposed change ---- ----------------------------------------------------- ____ Allow "Spreadsheet-like" entry of data ____ Ability to sort database ____ More ANOVA types ____ More Non-parametric statistical tests ____ General Linear Model ____ Make Report more flexible ____ Stem and leaf plot ____ Quality Control Module ____ More speed ____ More graphics ____ Improve graphic quality ____ Cluster analysis ____ Discriminant analysis ____ Automate analysis from a command file ____ _____________________________________________ ____ _____________________________________________ Comments: Mail to:TexaSoft, P.O. Box 1169, Cedar Hill, Texas 75104. Fax to:214-291-3400 or send E-Mail to Compuserve 70721,3145. 44