The World of Computer Software

home *** CD-ROM | disk | FTP | other *** search

/ The World of Computer Software / World_Of_Computer_Software-02-385-Vol-1of3.iso / e / estat21.zip / EASIDOC.ZIP / ESCHAP04.DOC < prev next >

Wrap

Text File | 1992-09-14 | 26KB | 794 lines

Chapter 4: COMMAND SYNTAX To get EASISTAT to do anything you have to give it a command to perform. Most commands will also require other information, such as which columns to apply a particular analysis to. The way in which this information is specified can be called the format or syntax of the command. When each command is described in this manual its format is defined with it, but to understand the format required there are some important general rules it is helpful to become familiar with first. To illustrate these rules we will take the BASICS command, which is described later in the manual in the following way: Format: B[asics] [r[anks]] [g[raphfile]] column [if condition] This means that the command is called BASICS, but that all but the first letter B is optional, that it can have an optional switch called RANKS, that it can produce a graph file, that the column to be studied must be specified, and that finally one can add logical and arithmetic conditions limiting the rows to be studied by entering IF followed by a condition. This means that one can enter for instance: BASICS RANKS C1 IF (C2-C3)<7 This command would cause the basic description of the values in column 1 to be displayed (their mean, mode, standard deviation, etc.), taking only those rows for which the value in column 2 minus the value in column 3 was less than 7, and would also display the rank and frequency table of the values. In the sections which follow we will look at the individual aspects of command format more closely. 4:1. Selecting a command Here is the list of commands available (it is displayed by EASISTAT when the HELP command is selected): Help Titles Basics [G] [ranks] col [if cond] Data [save] [filename] List [if cond or variables] Chisq [G] [Fish] [Num] [c r] Wilcoxon [S] [col] Ttest [paired] Anova [G or N] [col] Kendall's [col col] Kolmogorov [G] [col] Regress [G] [col col [col]] Multiple regression [col [col]] Component analysis [G] Minimise [expression] Narrow cond Widen [cond] Arithmetic [expression] New [col or row or next] Label [col [name]] Format [col or def [width dec]] Derive [col [expression]] Delete [col or row or all] Echo Input [filename or close] Output [filename or close] Log [filename or close] Edit Screen Sort [d[own]] [col] Pause System [command] Limits [cl] Copy [lc,tr,rc,br,new lc,new tr] Macro [list or label exp] Quit 24 Command syntax To select one of these commands when asked simply type in the first part of the first word of the name of the command. You do not have to type in the whole word. EASISTAT will select the first command from the above list that matches the letters you have typed. So to select the BASICS command you could type: BASICS or: BAS or: B Note that if you type: W you will get the WILCOXON command and not the WIDEN command. If you want the WIDEN command you would have to type at least: WID Remember eveything can be typed in upper or lower case, or a mixture. So: b is just as good as: BASICS To use the same example given above, one could type: B RANKS C1 IF (C2-C3)<7 or: bas ranks c1 if (c2-c3)<7 and achieve just the same result. When the format of a command is described in this manual, square brackets are put round the optional letters of the command name, e.g.: B[asics] Wid[en] 4:2. Command options Some of the commands can have different options attached to them. An option appears in square brackets in the list above and when the manual describes each command. An option can be selected or not, and sometimes more than one option can be selected. BASICS has an option, RANKS, which makes it print out a list of values in rank order 25 Command syntax together with values for rank, percentage and cumulative percentage. An option can often be abbreviated down to the first letter, although the whole word can be typed for clarity. Taking the above example one could enter: BASICS RANKS C1 IF (C2-C3)<7 or: b r c1 if (c2-c3)<7 If the "r" is left out, the BASICS command will do exactly the same thing but without also producing the rank/frequency table: b c1 if (c2-c3)<7 If the option to produce a graph file is selected then a file containing data to produce a frequency histogram will be written to disk and EASIGRAF will be run to display the graph: b g c1 The final option available in BASICS is IF followed by a condition which specifies which rows to study. IF is exceptional in that it must be written in full, just "I" would not be adequate. This option similarly can be left out, in which cases all the rows will be included in the analysis: b c1 or: BASICS C1 When using the BASICS command it is obligatory to specify the column for which the statistics are to be provided. For some of other the commands a column or filename is shown as being optional by being put in square brackets. This means that you do not have to type the column in the command line, but then you will be asked by EASISTAT to specify which column you mean in a second line. It's just a question of convenience for you whether you put the column into the command line or not. When you first start to use EASISTAT it is better just to select a command and then EASISTAT will ask you for other details, but as you grow more familiar with the program you may feel like typing more into the first line. There is an important point however. Although if you omit a necessary parameter like a column number you will usually be asked to supply it, in contrast if you omit an option like RANKS or IF from the command line you cannot add it in later in a second line. All the command options must be specified in the command line. 4:3. Referring to entries in the data table 26 Command syntax EASISTAT is based around the concept that there is a table of values with all the different measured variables attached to one data object (eg an experimental subject) are in one row, and that they are arranged in columns according to the quantity they measure. So one column might be for temperature, another for mass, etc. and one row might be for one rock and another row for another. Thus the statistical commands essentially operate down columns rather than across rows - the mean of one rock's weight and temperature would make little sense. In almost all commands we need to be able to tell EASISTAT which columns we want to look at and one way to do this is to type C followed immediately (no space) by a number between 1 and the number of columns there are, e.g. C1, c3, c25. We used this syntax in the example above when we said: b c1 meaning calculate the basic statistics of all the values appearing in the first column. Occasionally we may wish to refer to a specific row - when we want to delete a certain row perhaps. To do this we type the keyword ROW followed by a space and then the number of the row we mean, e.g. row 4, ROW 29. Every column of the table also has a label, which appears at the top of the column in the data file and when using the data editor. This would usually describe what it is that the values in that column are measures of, such as AGE or SEX. These labels can also be used to refer to the columns so that if the second column is labelled AGE then the following both mean the same thing: BASICS AGE BASICS C2 In the logical and arithmetic expressions described below it is necessary to refer to the values which appear in cells in different rows. The rows are looked at one at a time and then the value specified is taken from that row, so that in a logical expression c15 means "the value appearing in column 15 of the row I am currently interested in". In this context the keyword ROW (here appearing with no number after it) means "the number of the row I am currently interested in". In fact, we don't think you'll be referring to rows much and the whole thing should become fairly clear with a few examples. 4:4. Logical expressions In the example above there is a condition which specifies which rows are to be included in the analysis, and in general a condition is a statement of a logical expression which is either true or false. So c5>6 is true for a given row if the value in the fifth column of that row is 9, but false if it is 2. Conditions are used to specify whether or not a row belongs to a given group or not, and in the example given whether or not it should be 27 Command syntax included with the other values towards deciding the sample mean, standard deviation, etc. b c1 if (c2-c3)<7 means that if the statement given is true for a particular row then the value in column one will be taken and added to the sample and incorporated in the measurement of mean and standard deviation, but if not that particular row will be ignored for the purpose of the current analysis and left out of the sample. Here is a list of all the logical operators: x | y x OR y (either can be true) x & y x AND y (both must be true) x != y x NOT EQUAL TO y x = y x EQUAL TO y x < y x LESS THAN y x > y x GREATER THAN y x <= y x LESS THAN OR EQUAL TO y x >= y x GREATER THAN OR EQUAL TO y ! x NOT x The expression is true if the relationship between x and y is as described, but false if not. The OR relationship is true if either condition x or condition y is true, and the AND relationship is true only if both condition x and condition y are true. (Note that the symbol for the OR operator, |, appears as a broken vertical line on most keyboards and is usually found on the same key as the backslash, \.) Here are some examples: c13>c12 - the value in column 13 must be higher than the value in column 12 c13<=c12 - the value in column 13 must be less than or equal to the value in column 12 !(c13>c12) - the value in column 13 must not be greater than the value in column 12, i.e. the same effect as the previous example c5>5 - the value in column 5 must be greater than 5 (c5>5)|(c6<10) - either the value in column 5 must be greater than 5 or the value in column 6 must be less than 10 for the condition to be true (c5>5)&(c6<10) - both the value in column 5 must be greater than 5 and the value in column 6 must be less than 10 for the condition to be true (c5>5)&(c5<10) - the value in column 5 must both be greater than 5 and less than 10, i.e. the value in column 5 must lie between 28 Command syntax 5 and 10 (c5=3)|(c5=5)|(c5=7) - the value in column 5 must be 3, 5 or 7 (c5=3)&(c5=5)&(c5=7) - this can never be true, the value in column 5 cannot be 3 and 5 and 7 all at once (c5!=3)&(c5!=5)&(c5!=7) - the value in column 5 must be anything except 3, 5 or 7 (c5=2.2) | (((c5>5) & (c5<10)) & !((c5>6) & (c5<7))) - the value in column 5 can be 2.2, or else it must lie between 5 and 10 but must not lie between 6 and 7 So you see expressions can be made as simple or as complicated as you like, and sub-clauses are contained in brackets. In fact the brackets are not always necessary - see the section below on operator precedence. Spaces occurring in logical and arithmetic expressions are ignored. When the keyword ROW is used in a condition then the number of the row itself is tested, so: row<5 is true if the number of the row is less than 5, i.e. one of the first four rows. Contrast this to: c1<5 which as we have already discussed means the rows for which the value in the first column is less than 5. There is no reason why references to rows and column cannot be combined in the same expression, e.g.: (c1<10) & (row<5) or even: c1<row Note to users familiar with spreadsheets: You will probably realise that these references to values are what are usually called relative references, they relate only to the column of the row in question. There is for example no direct way to compare values in an expression to the value in the third column of the fifth row - an absolute reference. A partial exception to this allowing references to the values in the first valid row only is described in the section on the ARITHMETIC command. 29 Command syntax 4:5. Arithmetic expressions When we said C2-C3 in the first example, that was an arithmetic expression meaning (obviously) the value in column 2 minus the value in column 3. Many other operations are available, and here is a complete list: x + y x MINUS y x + y x PLUS y x mod y x MODULO y x / y x DIVIDED BY y x * y x TIMES y x pow y x TO THE yTH POWER x pX y PROBABILITY OF CHI-SQUARED x WITH y DEGREES OF FREEDOM x pT y PROBABILITY OF STUDENT'S T x WITH y DEGREES OF FREEDOM - x NEGATIVE x abs x ABSOLUTE VALUE OF x pN x ONE-TAILED CUMULATIVE PROBABILITY OF x IN A NORMAL DISTRIBUTION log x LOG x BASE 10 ln x NATURAL LOG OF x (base e) exp x e TO THE xTH POWER (natural antilog) lfact x NATURAL LOG OF FACTORIAL x sin x SIN x RADIANS arcsin x ARCSIN x IN RADIANS cos x COS x RADIANS arccos x ARCCOS x IN RADIANS tan x TAN x RADIANS arctan x ARCTAN x IN RADIANS Arithmetic expressions can be written out just as they would appear on paper, again with brackets to clarify the order of operations: c3+c4*6 (c3+c4)*6 (c4*c5)pow3 (c4*c5*c6)pow(1/3) The last example takes the geometric mean of columns 4, 5, and 6. It multiplies them together and then takes the cube root of the product. c13 pX 3 This example would take the value in column 13 to be a chi-squared statistic with three degrees of freedom and computes its probablility, or p value. See the section on the ARITHMETIC command for details of how to use EASISTAT as a set of statistics tables to look up. Numbers can be written for arithmetic expressions either as ordinary decimal numbers (37, -45.236, 11.4) or using exponential notation, where the mantissa is followed immediately by an E and the power of ten to multiply it by. Thus 3.3e5 is the same as 330000 and 2.7e-4 means 0.00027. The results of the ARITHMETIC command are output in this format where appropriate. 30 Command syntax 4:6. Operator precedence Operator precedence means the order in which logical or arithmetic operations are performed. We are all familiar with the idea that: 3+4/2 means three added to four-divided-by-two and not three- plus-four divided by two, so that the answer is 5 and not 3.5. Here is a complete list of operator precedences: x pX y x pT y x pow y x / y x * y x - y x + y x < y x > y x <= y x >= y x != y x = y x & y x | y The higher up an operator is, the sooner it is performed, as if it was enclosed in a pair of brackets. Operators in the same line are evaluated left to right, in the order in which they are read. This means that writing: c1+c2*c3+c4powc5*4pt15 is equivalent to writing: c1 + (c2*c3) + ((c4powc5)*(4pt15)) but that writing: c1-c2+c3 is equivalent to writing: (c1-c2)+c3 If you want the operations carried out in any other order then you have to explicitly specify it yourself using brackets. Note that you do not have to use brackets in the expression: (c5>5)&(c5<10) because: c5>5&c5<10 will be interpreted in exactly the same way - the > and < take precedence over the &. 31 Command syntax 4:7. Combining arithmetic and logical operations Not only can arithmetic and logical operations be arbitrarily complex, they can also be combined together in very flexible ways. Of course we have already seen examples of this because "(c2-c3)>7" combines an arithmetic expression, "c2-c3", with a logical one, "result>7". If this is the first time you are reading this manual then you may want to skip this section and come back to it later. Still here? Okay, here we go. The first point is that arithmetic values also have a logical value, which means that any number not equal to zero is logically true and any number equal to zero is false. So: 1 is true 0 is false 4-3 is true 3-4 is true 3-3 is false 3|1.5|0 is true 3&1.5&0 is false The second, more useful, point is that any logical expression has an arithmetic value, which is 1 if it is true and 0 if it is false. So: 3<4 = 1 3>4 = 0 (3<4)+(4<5) = 2 Which means we can write: c3*(c2<c3) + c2*(c2>=c3) to mean "whichever is higher of c2 and c3". If c3 is higher then the value inside the first brackets will be true, so 1. Then we multiply c3 by 1 and get c3. But the value in the second brackets will be false, or 0, and c2 multiplied by 0 is 0 and the whole expression will evaluate to c3*1+0, or c3. If c2 is higher than c3 or equal to it then c2 will be given as the result. Here's another one: LN( C15*(C15>0) + 0.0001*(C15<=0) ) The idea of this is to take the natural log of c15. But if c15 were less than or equal to zero then trying to take its log would cause an error, so what this expression in fact does is to take the natural log of c15 if it greater than zero, but otherwise it takes the natural log of 0.0001. We hope this gives you an idea of the principles involved. You can do almost anything with the operators provided, though some things are a bit long-winded. More examples will be given in individual command descriptions, and especially in the descriptions of the ARITHMETIC, MACRO and INPUT commands. 32 Command syntax Incidentally, perhaps you can see now why we could not just write: 5<c5<10 to specify that c5 should lie between 5 and 10. EASISTAT would interpret the expression as follows: 5<c5 is either true or false; if it is true it has a value of 1 and if it is false it has a value of 0; whether it has a value of 0 or 1 it will still be less than 10, so the expression as a whole will always be true. 4:8. General purpose variables As well as the values in the data table, EASISTAT provides a number of other variables which are not needed for the standard statistical tests but which may be used for a variety of other purposes and are particularly useful for general mathematical manipulations. Firstly, there are twenty general purpose variables which can be set to any values and used in arithmetic and logical expressions. These variables are referred to as V1, V2...V20, but they can also be labelled like the data columns and then referred to by these labels. The values are set by the DERIVE command and the labels are set by the LABEL command. For example if one wishes to have a value for pi available in expressions one could enter: LABEL V1 PI DERIVE PI ARCCOS(-1) This would set the label of first general purpose variable to be pi, and then set its value to be the arccos of -1 in radians. It could thereafter be referred to in expressions as pi, e.g.: C2+PI*C3 These variables can be displayed with the command: LIST VARIABLES which can be shortened to: L V There are three special variables which are set by the BASICS command, and these are referred to as XNUMBER, XTOTAL and XMEAN. When the BASICS command is carried out, XNUMBER is set to the number of items present (valid rows used), XTOTAL is set to the total of the values used and XMEAN is set to the mean of the values (which would be XTOTAL/XNUMBER). The purpose of these variables is to make possible the construction of other statistical tests which may not be directly built in to EASISTAT. The values obtained from the BASICS command can be used in subsequent arithmetic expressions. An example of this is given in the section on the INPUT command. 33 Command syntax As an extension to this there are another twenty variables which are set by the BASICS command and other statistical commands. These are referred to as VV1, VV2...VV20. How these are set for each command can be discovered by displaying them after the command is performed. They are displayed by the command: LIST VV So if for example one wishes to use the value of the standard error of the mean as calculated by the BASICS command one could enter the following: BASICS C5 LIST VV It then becomes clear that VV7 has been set to the standard error. If one wished to make a permanent copy of it one could then set one of the first twenty variables to have that value, since the value of VV7 itself will be changed by the next statistics command which is performed: DERIVE V1 VV7 4:9. Rules for label names There are certain rules which must be followed for the labels used so that expressions will be parsed correctly. In general any combination of letters and digits can be used for the column titles and the labels of variables, but there are certain exceptions. Also note that there is no difference between upper and lower case letters, and that spaces in expressions are ignored. a) Don't start any label with a digit e.g. 1stname, 2A, etc. b) Don't start any label with a C or a V or a VV followed by a digit e.g. C2D, c40, v2, VV4xyz, etc. c) Don't call any column or variable ROW, P, G, E, RANKS, ZED, XNUMBER, XTOTAL or XMEAN. d) Don't start any label with the name of a mathematical function e.g. sint (uses SIN), pname (uses pN), etc. See the full list of arithmetic operators above. e) Don't start any label with the name of another column or variable, i.e. if one column is called DA, then no column or variable label must begin with the letters DA. (Strictly speaking this rule only applies to columns to the right of the first one, that is to say if that one column is titled DA then no column to the right of it should begin with DA.) If you get any problems with syntax errors or expressions not behaving as you think they should then come back and check these rules. For example if you did have a column called PNAME, then the expression sin(PNAME) would give you a syntax error with AME, because the parser would 34 Command syntax think that the PN referred to the normal probability function. Similarly if you had a column labelled 4 then the expression <4 would not mean "less than 4" but "less than the value in the column labelled 4". 35