home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The World of Computer Software
/
World_Of_Computer_Software-02-385-Vol-1of3.iso
/
e
/
estat21.zip
/
EASIDOC.ZIP
/
ESCHAP04.DOC
< prev
next >
Wrap
Text File
|
1992-09-14
|
26KB
|
794 lines
Chapter 4: COMMAND SYNTAX
To get EASISTAT to do anything you have to give it a
command to perform. Most commands will also require other
information, such as which columns to apply a particular
analysis to. The way in which this information is
specified can be called the format or syntax of the
command. When each command is described in this manual
its format is defined with it, but to understand the
format required there are some important general rules it
is helpful to become familiar with first.
To illustrate these rules we will take the BASICS
command, which is described later in the manual in the
following way:
Format: B[asics] [r[anks]] [g[raphfile]] column [if condition]
This means that the command is called BASICS, but that
all but the first letter B is optional, that it can have
an optional switch called RANKS, that it can produce a
graph file, that the column to be studied must be
specified, and that finally one can add logical and
arithmetic conditions limiting the rows to be studied by
entering IF followed by a condition. This means that one
can enter for instance:
BASICS RANKS C1 IF (C2-C3)<7
This command would cause the basic description of the
values in column 1 to be displayed (their mean, mode,
standard deviation, etc.), taking only those rows for
which the value in column 2 minus the value in column 3
was less than 7, and would also display the rank and
frequency table of the values. In the sections which
follow we will look at the individual aspects of command
format more closely.
4:1. Selecting a command
Here is the list of commands available (it is displayed
by EASISTAT when the HELP command is selected):
Help Titles
Basics [G] [ranks] col [if cond] Data [save] [filename]
List [if cond or variables] Chisq [G] [Fish] [Num] [c r]
Wilcoxon [S] [col] Ttest [paired]
Anova [G or N] [col] Kendall's [col col]
Kolmogorov [G] [col] Regress [G] [col col [col]]
Multiple regression [col [col]] Component analysis [G]
Minimise [expression] Narrow cond
Widen [cond] Arithmetic [expression]
New [col or row or next] Label [col [name]]
Format [col or def [width dec]] Derive [col [expression]]
Delete [col or row or all] Echo
Input [filename or close] Output [filename or close]
Log [filename or close] Edit
Screen Sort [d[own]] [col]
Pause System [command]
Limits [cl] Copy [lc,tr,rc,br,new lc,new tr]
Macro [list or label exp] Quit
24
Command syntax
To select one of these commands when asked simply type in
the first part of the first word of the name of the
command. You do not have to type in the whole word.
EASISTAT will select the first command from the above
list that matches the letters you have typed. So to
select the BASICS command you could type:
BASICS
or:
BAS
or:
B
Note that if you type:
W
you will get the WILCOXON command and not the WIDEN
command. If you want the WIDEN command you would have to
type at least:
WID
Remember eveything can be typed in upper or lower case,
or a mixture. So:
b
is just as good as:
BASICS
To use the same example given above, one could type:
B RANKS C1 IF (C2-C3)<7
or:
bas ranks c1 if (c2-c3)<7
and achieve just the same result.
When the format of a command is described in this manual,
square brackets are put round the optional letters of the
command name, e.g.:
B[asics]
Wid[en]
4:2. Command options
Some of the commands can have different options attached
to them. An option appears in square brackets in the list
above and when the manual describes each command. An
option can be selected or not, and sometimes more than
one option can be selected. BASICS has an option, RANKS,
which makes it print out a list of values in rank order
25
Command syntax
together with values for rank, percentage and cumulative
percentage. An option can often be abbreviated down to
the first letter, although the whole word can be typed
for clarity. Taking the above example one could enter:
BASICS RANKS C1 IF (C2-C3)<7
or:
b r c1 if (c2-c3)<7
If the "r" is left out, the BASICS command will do
exactly the same thing but without also producing the
rank/frequency table:
b c1 if (c2-c3)<7
If the option to produce a graph file is selected then a
file containing data to produce a frequency histogram
will be written to disk and EASIGRAF will be run to
display the graph:
b g c1
The final option available in BASICS is IF followed by a
condition which specifies which rows to study. IF is
exceptional in that it must be written in full, just "I"
would not be adequate. This option similarly can be left
out, in which cases all the rows will be included in the
analysis:
b c1
or:
BASICS C1
When using the BASICS command it is obligatory to specify
the column for which the statistics are to be provided.
For some of other the commands a column or filename is
shown as being optional by being put in square brackets.
This means that you do not have to type the column in the
command line, but then you will be asked by EASISTAT to
specify which column you mean in a second line. It's just
a question of convenience for you whether you put the
column into the command line or not. When you first start
to use EASISTAT it is better just to select a command and
then EASISTAT will ask you for other details, but as you
grow more familiar with the program you may feel like
typing more into the first line. There is an important
point however. Although if you omit a necessary parameter
like a column number you will usually be asked to supply
it, in contrast if you omit an option like RANKS or IF
from the command line you cannot add it in later in a
second line. All the command options must be specified in
the command line.
4:3. Referring to entries in
the data table
26
Command syntax
EASISTAT is based around the concept that there is a
table of values with all the different measured variables
attached to one data object (eg an experimental subject)
are in one row, and that they are arranged in columns
according to the quantity they measure. So one column
might be for temperature, another for mass, etc. and one
row might be for one rock and another row for another.
Thus the statistical commands essentially operate down
columns rather than across rows - the mean of one rock's
weight and temperature would make little sense.
In almost all commands we need to be able to tell
EASISTAT which columns we want to look at and one way to
do this is to type C followed immediately (no space) by a
number between 1 and the number of columns there are,
e.g. C1, c3, c25. We used this syntax in the example
above when we said:
b c1
meaning calculate the basic statistics of all the values
appearing in the first column. Occasionally we may wish
to refer to a specific row - when we want to delete a
certain row perhaps. To do this we type the keyword ROW
followed by a space and then the number of the row we
mean, e.g. row 4, ROW 29.
Every column of the table also has a label, which appears
at the top of the column in the data file and when using
the data editor. This would usually describe what it is
that the values in that column are measures of, such as
AGE or SEX. These labels can also be used to refer to the
columns so that if the second column is labelled AGE then
the following both mean the same thing:
BASICS AGE
BASICS C2
In the logical and arithmetic expressions described below
it is necessary to refer to the values which appear in
cells in different rows. The rows are looked at one at a
time and then the value specified is taken from that row,
so that in a logical expression c15 means "the value
appearing in column 15 of the row I am currently
interested in". In this context the keyword ROW (here
appearing with no number after it) means "the number of
the row I am currently interested in". In fact, we don't
think you'll be referring to rows much and the whole
thing should become fairly clear with a few examples.
4:4. Logical expressions
In the example above there is a condition which specifies
which rows are to be included in the analysis, and in
general a condition is a statement of a logical
expression which is either true or false. So c5>6 is true
for a given row if the value in the fifth column of that
row is 9, but false if it is 2. Conditions are used to
specify whether or not a row belongs to a given group or
not, and in the example given whether or not it should be
27
Command syntax
included with the other values towards deciding the
sample mean, standard deviation, etc.
b c1 if (c2-c3)<7
means that if the statement given is true for a
particular row then the value in column one will be taken
and added to the sample and incorporated in the
measurement of mean and standard deviation, but if not
that particular row will be ignored for the purpose of
the current analysis and left out of the sample.
Here is a list of all the logical operators:
x | y x OR y (either can be true)
x & y x AND y (both must be true)
x != y x NOT EQUAL TO y
x = y x EQUAL TO y
x < y x LESS THAN y
x > y x GREATER THAN y
x <= y x LESS THAN OR EQUAL TO y
x >= y x GREATER THAN OR EQUAL TO y
! x NOT x
The expression is true if the relationship between x and
y is as described, but false if not. The OR relationship
is true if either condition x or condition y is true, and
the AND relationship is true only if both condition x and
condition y are true. (Note that the symbol for the OR
operator, |, appears as a broken vertical line on most
keyboards and is usually found on the same key as the
backslash, \.) Here are some examples:
c13>c12
- the value in column 13 must be higher than the value in
column 12
c13<=c12
- the value in column 13 must be less than or equal to
the value in column 12
!(c13>c12)
- the value in column 13 must not be greater than the
value in column 12, i.e. the same effect as the previous
example
c5>5
- the value in column 5 must be greater than 5
(c5>5)|(c6<10)
- either the value in column 5 must be greater than 5 or
the value in column 6 must be less than 10 for the
condition to be true
(c5>5)&(c6<10)
- both the value in column 5 must be greater than 5 and
the value in column 6 must be less than 10 for the
condition to be true
(c5>5)&(c5<10)
- the value in column 5 must both be greater than 5 and
less than 10, i.e. the value in column 5 must lie between
28
Command syntax
5 and 10
(c5=3)|(c5=5)|(c5=7)
- the value in column 5 must be 3, 5 or 7
(c5=3)&(c5=5)&(c5=7)
- this can never be true, the value in column 5 cannot be
3 and 5 and 7 all at once
(c5!=3)&(c5!=5)&(c5!=7)
- the value in column 5 must be anything except 3, 5 or 7
(c5=2.2) | (((c5>5) & (c5<10)) & !((c5>6) & (c5<7)))
- the value in column 5 can be 2.2, or else it must lie
between 5 and 10 but must not lie between 6 and 7
So you see expressions can be made as simple or as
complicated as you like, and sub-clauses are contained in
brackets. In fact the brackets are not always necessary -
see the section below on operator precedence. Spaces
occurring in logical and arithmetic expressions are
ignored.
When the keyword ROW is used in a condition then the
number of the row itself is tested, so:
row<5
is true if the number of the row is less than 5, i.e. one
of the first four rows.
Contrast this to:
c1<5
which as we have already discussed means the rows for
which the value in the first column is less than 5.
There is no reason why references to rows and column
cannot be combined in the same expression, e.g.:
(c1<10) & (row<5)
or even:
c1<row
Note to users familiar with spreadsheets:
You will probably realise that these references to values
are what are usually called relative references, they
relate only to the column of the row in question. There
is for example no direct way to compare values in an
expression to the value in the third column of the fifth
row - an absolute reference. A partial exception to this
allowing references to the values in the first valid row
only is described in the section on the ARITHMETIC
command.
29
Command syntax
4:5. Arithmetic expressions
When we said C2-C3 in the first example, that was an
arithmetic expression meaning (obviously) the value in
column 2 minus the value in column 3. Many other
operations are available, and here is a complete list:
x + y x MINUS y
x + y x PLUS y
x mod y x MODULO y
x / y x DIVIDED BY y
x * y x TIMES y
x pow y x TO THE yTH POWER
x pX y PROBABILITY OF CHI-SQUARED x WITH y DEGREES OF
FREEDOM
x pT y PROBABILITY OF STUDENT'S T x WITH y DEGREES OF
FREEDOM
- x NEGATIVE x
abs x ABSOLUTE VALUE OF x
pN x ONE-TAILED CUMULATIVE PROBABILITY OF x IN A NORMAL
DISTRIBUTION
log x LOG x BASE 10
ln x NATURAL LOG OF x (base e)
exp x e TO THE xTH POWER (natural antilog)
lfact x NATURAL LOG OF FACTORIAL x
sin x SIN x RADIANS
arcsin x ARCSIN x IN RADIANS
cos x COS x RADIANS
arccos x ARCCOS x IN RADIANS
tan x TAN x RADIANS
arctan x ARCTAN x IN RADIANS
Arithmetic expressions can be written out just as they
would appear on paper, again with brackets to clarify the
order of operations:
c3+c4*6
(c3+c4)*6
(c4*c5)pow3
(c4*c5*c6)pow(1/3)
The last example takes the geometric mean of columns 4,
5, and 6. It multiplies them together and then takes the
cube root of the product.
c13 pX 3
This example would take the value in column 13 to be a
chi-squared statistic with three degrees of freedom and
computes its probablility, or p value. See the section on
the ARITHMETIC command for details of how to use EASISTAT
as a set of statistics tables to look up.
Numbers can be written for arithmetic expressions either
as ordinary decimal numbers (37, -45.236, 11.4) or using
exponential notation, where the mantissa is followed
immediately by an E and the power of ten to multiply it
by. Thus 3.3e5 is the same as 330000 and 2.7e-4 means
0.00027. The results of the ARITHMETIC command are output
in this format where appropriate.
30
Command syntax
4:6. Operator precedence
Operator precedence means the order in which logical or
arithmetic operations are performed. We are all familiar
with the idea that:
3+4/2
means three added to four-divided-by-two and not three-
plus-four divided by two, so that the answer is 5 and not
3.5.
Here is a complete list of operator precedences:
x pX y x pT y
x pow y
x / y x * y
x - y x + y
x < y x > y x <= y x >= y
x != y x = y
x & y
x | y
The higher up an operator is, the sooner it is performed,
as if it was enclosed in a pair of brackets. Operators in
the same line are evaluated left to right, in the order
in which they are read. This means that writing:
c1+c2*c3+c4powc5*4pt15
is equivalent to writing:
c1 + (c2*c3) + ((c4powc5)*(4pt15))
but that writing:
c1-c2+c3
is equivalent to writing:
(c1-c2)+c3
If you want the operations carried out in any other order
then you have to explicitly specify it yourself using
brackets.
Note that you do not have to use brackets in the
expression:
(c5>5)&(c5<10)
because:
c5>5&c5<10
will be interpreted in exactly the same way - the > and <
take precedence over the &.
31
Command syntax
4:7. Combining arithmetic
and logical operations
Not only can arithmetic and logical operations be
arbitrarily complex, they can also be combined together
in very flexible ways. Of course we have already seen
examples of this because "(c2-c3)>7" combines an
arithmetic expression, "c2-c3", with a logical one,
"result>7".
If this is the first time you are reading this manual
then you may want to skip this section and come back to
it later. Still here? Okay, here we go. The first point
is that arithmetic values also have a logical value,
which means that any number not equal to zero is
logically true and any number equal to zero is false. So:
1 is true
0 is false
4-3 is true
3-4 is true
3-3 is false
3|1.5|0 is true
3&1.5&0 is false
The second, more useful, point is that any logical
expression has an arithmetic value, which is 1 if it is
true and 0 if it is false. So:
3<4 = 1
3>4 = 0
(3<4)+(4<5) = 2
Which means we can write:
c3*(c2<c3) + c2*(c2>=c3)
to mean "whichever is higher of c2 and c3". If c3 is
higher then the value inside the first brackets will be
true, so 1. Then we multiply c3 by 1 and get c3. But the
value in the second brackets will be false, or 0, and c2
multiplied by 0 is 0 and the whole expression will
evaluate to c3*1+0, or c3. If c2 is higher than c3 or
equal to it then c2 will be given as the result.
Here's another one:
LN( C15*(C15>0) + 0.0001*(C15<=0) )
The idea of this is to take the natural log of c15. But
if c15 were less than or equal to zero then trying to
take its log would cause an error, so what this
expression in fact does is to take the natural log of c15
if it greater than zero, but otherwise it takes the
natural log of 0.0001.
We hope this gives you an idea of the principles
involved. You can do almost anything with the operators
provided, though some things are a bit long-winded. More
examples will be given in individual command
descriptions, and especially in the descriptions of the
ARITHMETIC, MACRO and INPUT commands.
32
Command syntax
Incidentally, perhaps you can see now why we could not
just write:
5<c5<10
to specify that c5 should lie between 5 and 10. EASISTAT
would interpret the expression as follows: 5<c5 is either
true or false; if it is true it has a value of 1 and if
it is false it has a value of 0; whether it has a value
of 0 or 1 it will still be less than 10, so the
expression as a whole will always be true.
4:8. General purpose
variables
As well as the values in the data table, EASISTAT
provides a number of other variables which are not needed
for the standard statistical tests but which may be used
for a variety of other purposes and are particularly
useful for general mathematical manipulations. Firstly,
there are twenty general purpose variables which can be
set to any values and used in arithmetic and logical
expressions. These variables are referred to as V1,
V2...V20, but they can also be labelled like the data
columns and then referred to by these labels. The values
are set by the DERIVE command and the labels are set by
the LABEL command. For example if one wishes to have a
value for pi available in expressions one could enter:
LABEL V1 PI
DERIVE PI ARCCOS(-1)
This would set the label of first general purpose
variable to be pi, and then set its value to be the
arccos of -1 in radians. It could thereafter be referred
to in expressions as pi, e.g.:
C2+PI*C3
These variables can be displayed with the command:
LIST VARIABLES
which can be shortened to:
L V
There are three special variables which are set by the
BASICS command, and these are referred to as XNUMBER,
XTOTAL and XMEAN. When the BASICS command is carried out,
XNUMBER is set to the number of items present (valid rows
used), XTOTAL is set to the total of the values used and
XMEAN is set to the mean of the values (which would be
XTOTAL/XNUMBER). The purpose of these variables is to
make possible the construction of other statistical tests
which may not be directly built in to EASISTAT. The
values obtained from the BASICS command can be used in
subsequent arithmetic expressions. An example of this is
given in the section on the INPUT command.
33
Command syntax
As an extension to this there are another twenty
variables which are set by the BASICS command and other
statistical commands. These are referred to as VV1,
VV2...VV20. How these are set for each command can be
discovered by displaying them after the command is
performed. They are displayed by the command:
LIST VV
So if for example one wishes to use the value of the
standard error of the mean as calculated by the BASICS
command one could enter the following:
BASICS C5
LIST VV
It then becomes clear that VV7 has been set to the
standard error. If one wished to make a permanent copy of
it one could then set one of the first twenty variables
to have that value, since the value of VV7 itself will be
changed by the next statistics command which is
performed:
DERIVE V1 VV7
4:9. Rules for label names
There are certain rules which must be followed for the
labels used so that expressions will be parsed correctly.
In general any combination of letters and digits can be
used for the column titles and the labels of variables,
but there are certain exceptions. Also note that there is
no difference between upper and lower case letters, and
that spaces in expressions are ignored.
a) Don't start any label with a digit e.g. 1stname, 2A,
etc.
b) Don't start any label with a C or a V or a VV followed
by a digit e.g. C2D, c40, v2, VV4xyz, etc.
c) Don't call any column or variable ROW, P, G, E, RANKS,
ZED, XNUMBER, XTOTAL or XMEAN.
d) Don't start any label with the name of a mathematical
function e.g. sint (uses SIN), pname (uses pN), etc. See
the full list of arithmetic operators above.
e) Don't start any label with the name of another column
or variable, i.e. if one column is called DA, then no
column or variable label must begin with the letters DA.
(Strictly speaking this rule only applies to columns to
the right of the first one, that is to say if that one
column is titled DA then no column to the right of it
should begin with DA.)
If you get any problems with syntax errors or expressions
not behaving as you think they should then come back and
check these rules. For example if you did have a column
called PNAME, then the expression sin(PNAME) would give
you a syntax error with AME, because the parser would
34
Command syntax
think that the PN referred to the normal probability
function. Similarly if you had a column labelled 4 then
the expression <4 would not mean "less than 4" but "less
than the value in the column labelled 4".
35