Monster Media 1993 #2

home *** CD-ROM | disk | FTP | other *** search

/ Monster Media 1993 #2 / Image.iso / math / ksprob21.zip / KSDOCS.EXE / KSPDAT.DOC < prev next >

Wrap

Text File | 1992-12-21 | 31KB | 616 lines

kspdat 2.10 Joseph C. Hudson 4903 Algonquin Clarkston, MI 48348 Introduction kspdat is a contraction of ks probability data. Introductory prob and stat textbooks usually have a few tables for common distributions and a few pictures of probability and density functions. Occasionally, cdfs and oc curves may be seen. kspdat is a first attempt at allowing prob and stat instructors to use the tables and pictures they want to use, rather than being restricted to those that the author chose. kspdat does not produce pictures, or directly usable tables, for that matter. It does produce tables of pdfs (density or distribution functions), cdfs, hazard functions, reliability (survival) functions and inverse cdfs. The output can be edited into replacements for book's tables (like when you want to pass out tables for testing purposes without violating someone's copyright), alternate forms for these tables, additional tables, or, most important for me, output can be fed into a graphing program to produce pictures. I do not offer a warranty or guarantee of any kind for this program. I've tried hard to make the output correct, but using it with new data sets and different machines may reveal errors I'm not aware of. Follow the advice of Gerard E. Dallal (Statistical Microcomputing - Like It Is, American Statistician, V42 N3 Aug 1988): assume that this program does everything wrong until you put it through its paces with difficult input and conclude otherwise. Above all, enjoy. If you care to send me a brief report about what you like and don't like about this program, it would be very much appreciated. kspdat is copyright (C) 1990-93 Joseph C. Hudson 4903 Algonquin Clarkston MI 48348. All rights are reserved. kspdat page 2 examples of use let's start with a couple of quick examples to illustrate what kspdat does. Start kspdat. You see the main menu: ┌──────────────────────────────────────────────────────────────┐ │ kspdat 2.10 │ │ │ │ exit help save spec get spec view data view dfile │ │ compute dir save data view file view names view cfile │ │ view graph │ │ │ │ cols to graph: │ │ data file: │ │ indep var: │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └──────────────────────────────────────────────────────────────┘ the cursor will be on "help". If you hit return, you will be able to page through a series of help screens. Don't bother to right now. Instead, use the down arrow to move to the "data file:" prompt. The help box at the bottom of the screen will ask you to enter the name of a file to use for output. Do it. For this exercise, use bintable, prefaced by the drive/subdirectory you want the output to go to. If you hit return or the down arrow after entering the name, the cursor will be at the <indep var:> prompt. If not, use the arrow keys to move the cursor there. type in x 0(1)10 Hit return or down arrow when done. the cursor will go to the blank area below and prompt you with "c2:". type in bi 10 .2 cdf Hit return or down arrow when done. you will get a "c3:" prompt. Hold the alt key down and type the letter C. The bi 10 2 cdf should have been copied from the line above. arrow over to the 2 and change it to a 4, so that the line reads bi 10 .4 cdf Repeat twice, to get c4: bi 10 .6 cdf and c5: bi 10 .8 cdf. after you get the last line, hold the alt key down and type the letter Q (I'll refer to this simply as alt-Q or alt-whatever from now on). The program should put you at the "compute" prompt. Hit return. After the computer is done working, you will be at "view data". Hit return. You will see the data that was just generated. There will be some strange names heading each column. kspdat page 3 Hit Esc to get back to the main menu, go left and down with the arrow keys to the "save data" prompt and hit return. A new menu will appear: ┌──────────────────────────────────────────────────────────────┐ │ kspdat save data 2/26/90 20:06 390K free mem │ │ │ │ missing values: mvcode number of tables: one │ │ │ │ col column number of blank number of spaces for digits │ │ no. name leading spaces before d.p. after d.p. │ │ 1 x0(1)10 0 0 0 │ │ 2 bic10.2 0 0 0 │ │ 3 bic10.4 0 0 0 │ │ 4 bic10.6 0 0 0 │ │ 5 bic10.8 0 0 0 │ │ │ │ │ │ │ │ │ └──────────────────────────────────────────────────────────────┘ Leave the missing values and number of tables as they are, (hit return to get past them) and change the zeros to these values: │ 1 x0(1)10 2 2 0 │ │ 2 bic10.2 2 1 4 │ │ 3 bic10.4 2 1 4 │ │ 4 bic10.6 2 1 4 │ │ 5 bic10.8 2 1 4 │ Hit alt-Q when done, and then return. After a bit, you'll be back at the main menu at "view dfile". hit return, and you will see the fruits of your labor: 0 0.1074 0.0060 0.0001 0.0000 1 0.3758 0.0464 0.0017 0.0000 2 0.6778 0.1673 0.0123 0.0001 3 0.8791 0.3823 0.0548 0.0009 4 0.9672 0.6331 0.1662 0.0064 5 0.9936 0.8338 0.3669 0.0328 6 0.9991 0.9452 0.6177 0.1209 7 0.9999 0.9877 0.8327 0.3222 8 1.0000 0.9983 0.9536 0.6242 9 1.0000 0.9999 0.9940 0.8926 10 1.0000 1.0000 1.0000 1.0000 the first column of output are the x values from 0 to 10, the second column contains binomial cdf values for p = .2, and the remaining columns binomial cdf values for p = .4, .6 and .8. Take this into a text editor, add headers and you have a (small) binomial table. kspdat page 4 Exit kspdat by going to "exit" in the menu and typing X. The usual t, F and chi-square tables of upper tail areas can be constructed by specifying df as the independent variable. Let's run through the making of a small t table. Start kspdat and arrow down to the "indep var:" prompt. Type in df 1(4)5(5)30(10)40(20)120 hit return or the down arrow when done and you will be at the "c1:" prompt. Type in st .01 x Hit return or down arrow and you will go to the next line and get the "c2:" prompt. Type alt-C to copy the line above, then right arrow over to the 1 and change it to a 5, to get st .05 x Hit return or down arrow. at the "c3:" prompt, do your thing again to get st .10 x Now hit alt-Q to get to the "compute" prompt and hit return. When computations are complete, you will be at "view data". Hit return and you should see 1 2 3 4 df1(4)5(5 stx.01 stx.05 stx.10 1 1.0000 31.8205 6.3138 3.0777 2 5.0000 3.3649 2.0150 1.4759 3 10.0000 2.7638 1.8125 1.3722 4 15.0000 2.6025 1.7531 1.3406 5 20.0000 2.5280 1.7247 1.3253 6 25.0000 2.4851 1.7081 1.3163 7 30.0000 2.4573 1.6973 1.3104 8 40.0000 2.4233 1.6839 1.3031 9 60.0000 2.3901 1.6706 1.2958 10 80.0000 2.3739 1.6641 1.2922 11 100.0000 2.3642 1.6602 1.2901 12 120.0000 2.3578 1.6577 1.2886 kspdat page 5 Hit Esc to get back to the menu. Arrow down to the "data file:" prompt and enter a path and file name (with no extension). Then arrow up and across to "save data". Hit return, then hit return twice to begin editing the numbers of spaces. Enter 1 3 0 in the first row and 2 3 3 in rows 2 - 4, then hit alt-Q and return. You have created an ascii file, probably with a .d01 extension, with the following contents: 1 31.821 6.314 3.078 5 3.365 2.015 1.476 10 2.764 1.812 1.372 15 2.602 1.753 1.341 20 2.528 1.725 1.325 25 2.485 1.708 1.316 30 2.457 1.697 1.310 40 2.423 1.684 1.303 60 2.390 1.671 1.296 80 2.374 1.664 1.292 100 2.364 1.660 1.290 120 2.358 1.658 1.289 Add column headings, maybe a little picture on top like most books have; voilà, a t table. You can even make the picture with the help of kspdat (and your favorite graphing program and your favorite graphics editor and your favorite graphics incorporating word processor). degrees of freedom (df) can be specified as the independent variable only with dependent variables that are x values from the chi-square, F, Student's t distributions and their noncentral versions. In specifying the dependent variable, substitute the right tail area for df in the ch, nx st and nt specifications, and for df2 in the fd and nf specs. For the F and noncentral F, the ind var df is used as the denominator degrees of freedom. df1 must still be specified. In all cases, x must be specified as what to compute. examples with dep var specification df 1(1)30 this spec: computes a column of 30 values with ch .05 x 5% right tail % pts for the chi-square dist ch .95 x 5% left tail % pts for the chi-square dist nx .95 8.5 x 5% left tail % pts for the noncentral chi- square dist with noncentrality 8.5. st .01 x 1% right tail % pts for the Student`s t dist fd 5 .025 x 2.5% right tail % pts for F with 5 numerator df kspdat page 6 My main motivation in writing kspdat was to produce data sets for graphing. Try this exercise to see how to do this: Start kspdat again and use bino as the name of the output file. Enter x -0.5(.1)16.5 as the independent variable. In c2, put bi 16 .5 bar and in c3 put no 8 2 pdf. Go to "compute" and then "save data". In save data, make sure the number of tables is "many". (hit the T key when you're at the <number of tables:> prompt if necessary) Make the menu look like this: ┌──────────────────────────────────────────────────────────────┐ │ kspdat save data 2/26/90 22:33 387K free mem │ │ │ │ missing values: mvcode number of tables: many │ │ │ │ col column number of blank number of spaces for digits │ │ no. name leading spaces before d.p. after d.p. │ │ 1 x-0.5(.1) 2 2 2 │ │ 2 bib16.5 2 1 6 │ │ 3 nop82 2 1 6 │ │ │ Save the data and exit the program. you should have 4 new files: bino.d01, bino.c01, bino.d02 and bino.c02. The ".d" files are data files and the ".c" files are codebook files. The codebook files are used by ksstat for missing value info, column names and so on. For humans, they are a record of what's in the data files. Look at bino.d01. It has all x values that are n + .5, where n is an integer, repeated three times, with three different values in the second column. These values are there to allow a graphing program that can graph a data set to trace out a histogram of the binomial pdf with n = 16 and p = .5. We really don't need the .1 step size for this, but we do for the second data set, bino.d02. This contains values of the normal (Gaussian) pdf with µ = 8 and σ = 2, the same mean and standard deviation as the binomial distribution in bino.d01. Graphing both of these data sets together will show a normal curve superimposed on top of the binomial histogram. I can't supply a graphing program to do the graphing, but can recommend gnuplot. An early version of gnuplot produced the graphs in the files bino.com and bino.prn. All of these files are in the self extracting archive kspdbn.exe. To see the image on screen, run bino.com. use the command copy bino.prn prn /b to kspdat page 7 print the image on an Epson compatible printer. Bino.com was pro- duced using a program called grabber. With grabber produced screen images captured as .com files, the directory utility dirmagic can be used to make simple but effective (and cheap) slide shows. There is one additional file in kspdbn.exe, bino.spc. This is a kspdat specification file created with the "save spec" option in the main menu. It has all of the stuff that I typed into kspdat to create the .c## and .d## files. The "get spec" menu option reads .spc files. Once a .spc file is read in, you can edit it and produce new output without retyping all the information. running kspdat In running kspdat, you have some purpose in mind: a picture you want to draw or a table you want to write. To accomplish your task, you will have to 1. give a name for the data file(s). 2. describe the independent variable. In a table, the independent variable is what goes in the first column. In a graph, it is what is graphed on the horizontal (x) axis. 3. describe the dependent variable(s). These are what go in all other columns of a table or are graphed on the vertical (y) axis of a graph. You may want more than one of these for either tables or graphs. 4. compute the values of the dependent and independent variables. 5. save the computed data to disk file(s) for further processing with either a text editor or graphics program. 6. optionally, save the specifications used to generate your disk files so that you can later recall them for reuse without retyping. Since some specifications can be lengthy, this can save time. 7. exit the program. The next section describes the menu selections used to achieve these steps. kspdat page 8 the kspdat main menu The main menu consists of 17 choices or places to enter data. There is a help window at the bottom of the screen that shows brief help for each menu selection. The arrow keys will move you through the selections, with one exception. In the dependent variable selection area at the bottom of the menu, use alt-Q to leave this area. The menu choices are: exit press x to leave the program. If there is unsaved data, you will be given a chance to rescind your choice. help pressing the enter key will bring up a series of help screens. Esc brings you back to the main menu. save spec after you have entered all the information necessary to produce your output, including the formatting information entered in the save data menu selection, you can save all of the things you typed in in a .spc file. Hit enter to begin. You will be asked for a file name to use. The .spc extension is automatically used, so you need not type this in. get spec retreives information saved previously in a .spc file. You will be asked for a file name. As with save spec, the .spc extension is forced. view data after using compute, there is data in memory. Hitting enter on this selection allows you to see that data. Column names will appear at the top. These are constructed from the information used to generate the data and are not pretty, but they serve as a rough reminder of what is in the column. Don't worry about the format here. You can specify that when saving the data to disk. The word missing will appear in place of any values not computed. view names lets you see the names of the columns of data currently in memory. compute this is what gets the work done. after you have specified dependent and independent variable information, come here to actually compute the data. dir shows a disk directory. kspdat page 9 save data this lets you create your output file or files. The kspdat save data menu will appear. There are two choices to be made at the top of the menu. When you enter the menu, the cursor will be at the "missing values:" prompt, with "mvcode" showing. mvcode stands for missing value code. If there are any values that are missing in your data, kspdat will output either a missing value code or a blank. Blanks are appropriate when tables are the final product, missing value codes when the data will be read by another program. Missing value codes are numbers that do not otherwise appear in the column with the missing value. They are chosen by the program and reported, along with column names, in a codebook file that is written along with the data file. You specified a data file name in the main menu. This name is used, with extension .dnn for data files and .cnn for codebook files. The nn are digits chosen to avoid conflict with other file names. A data file and its corresponding codebook file always have the same name and same digits. The second menu selection is "number of tables:". The choices are one and many. With many, there is one table produced for each dependent variable. Each table is written to a separate disk file. Each file contains two columns, the independent variable in the first column and a dependent variable in the second column. This multiple file arrangement is necessary for some plotting programs. Many must be chosen if you have any "bar" indep varsand more than one independent variable. With "one", bar and pdf produce the same output with more than one ind- ependent variable. With only one ind var, the many or one choice is irrelevant. The third area of the save data menu allows you to specify the format of the output. For each column of output, specify three things: the number of leading blank spaces, the number of digits to allow for before thew decimal point and the number of digits to print after the decimal point. If the number of digits after is 0, the decimal point is not printed. After entering this information, type alt-Q. You will be given the choice of saving the data, going back to edit the information entered or aborting. view file choosing this entry will let you see any disk file. You will be asked for the file name. kspdat page 10 view dfile lets you see the last data file created. When multiple files are written, this is the one with the largest number in the extension, corresponding to the last dependent variable column. You can see other data files with the view file selection. view cfile lets you see the last codebook file created. Same comments as view dfile. view graph lets you see rough graphs of one or more dependent variable columns against the independent variable. This is meant only to give you a rough idea of what the data looks like. cols to graph: lets you specify which columns to view. data file: this is where you give the name to be used for the data and codebook files. No extension is necessary. If you give one, it will be discarded. indep var: here, you specify the independent variable, the contents of the first column of the table. enter either x, cdf or df followed by a range of values specified in from(step)to format or in min, max, number of steps format. Don't mix the two formats. e.g. x 1(1)10 start at 1 and go to 10 in steps of 1 x 1,10,9 start at 1 and go to 10 in 9 steps cdf 0(0.1).5(.05)1 start at 0 and go to .5 in steps of .1 then to 1 in steps of .05 cdf 0,.5,5,.5,1,10 start at 0 and go to .5 in 5 steps then from .5 to 1 in 10 steps. cdf 0,.5,5,.55,1,9 does the same thing df 1(1)30(10)120 both of these generate df 1,30,29,40,120,8 1 2 .. 30 40 50 .. 120 personally, I much prefer the () notation, hardly ever use the other. kspdat page 11 dep var when you hit return or the down arrow from the <indep var:> prompt, you will be in the go to the blank area below and be prompted with c2:. At this point, enter 1. a two letter code for the distribution 2. values of the parameters 3. bar, pdf, cdf, rel, haz or x The sixth help screen summarizes the choices: enter <dist> <params> <bar,pdf,cdf,haz,rel or x> use two letter code for dist, values of params in order listed F fd df1 df2 binomial bi n p noncen F nf df1 df2 nc disc uniform du min max gamma ga Θ a disc Weibull dw p ß inv Gausn ig µ lambda hypergeometric hy N n k Laplace la a b neg binomial nb p n lognormal ln µ σ Poisson po µ logistic lo µ σ beta be a b normal no µ σ Cauchy ca a b observed ob fname col# chi-square ch df Pareto pa b noncen chi-sq nx df nc Rayleigh ra b cont uniform cu min max Students t st df extr value lg el a b noncen t nt df nc extr value sm es a b triangular tr a b exponential ex µ Weibull we Θ ß δ Descriptions of the distributions and their parameters can be found in ks.doc. The solidus can be used to enter fractions, e.g. you can use 1/3 (1.0 / 3, 1/3.0, etc) instead of .33333333333333333. Mixed numbers cannot be used. That is, 4 1/3 is taken as two separate numbers, not as 13/3. If the independent variable is x, the dependent variables can be any combination of pdf, bar, cdf or rel from discrete distributions or pdf, cdf, haz or rel from continuous distributions. If the independent variable is cdf, the dependent variables can be x from any distribution or observed data. If you compute pdf values from a discrete distribution, you can compute either the actual pdf or bars. With bars, x is rounded to the nearest integer and the pdf is computed for that integer. This is useful for graphing. kspdat page 12 Examples: bi 10 .4 bar binomial n = 10 p = .4 bar format bi 10 .4 pdf binomial n = 10 p = .4 pdf format no 0 1 cdf normal (Gaussian) µ = 0 σ = 1 cdf ga 2 .5 x gamma a = 2 Θ = .5 inverse cdf ob c:df.dat 3 observed data in file df.dat, col 3 Use <enter>, the up arrow or the down arrow to complete the current entry. The cursor will go to the next available field, scrolling if necessary. Use alt-Q to end data entry. During data entry, you can use the the up and down arrow keys to move to a previously entered field to edit it. The information you type is checked for form and number of parameters, but parameter values are not checked for correctness. Misspecified parameters will lead to either missing or garbage output. For example, giving 10.5 for the binomial n will be accepted but will truncated to 10. Most computed values are good to 12+ significant digits. A few algorithms used, for the normal cdf and inv, t inv, chi sq inv, f inv, gamma cdf and inv, and beta cdf and inv, require that the number of significant digits in the returned value be specified beforehand. For most cases, the number specified is 10. This is a compromise between computation speed and accuracy. For refrences, see ks.doc.