home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The World of Computer Software
/
World_Of_Computer_Software-02-385-Vol-1of3.iso
/
e
/
estat21.zip
/
EASIDOC.ZIP
/
ESCHAP05.DOC
< prev
next >
Wrap
Text File
|
1992-09-14
|
31KB
|
926 lines
Chapter 5: DATA MANAGEMENT COMMANDS
In this section we describe how to manipulate the data
which EASISTAT uses. For details on how to read and write
data files on disk see the section on file management
commands and the DATA command.
5:1. Edit
Format: Ed[it]
EDIT is the command used to invoke EASISTAT's special
data editor. This consists simply of a table of rows and
columns into which data values are typed. Using the
cursor keys you can move freely over this table and
inspect, change or add values at will.
Special keys are used to operate the editor in a way
which does not apply to the rest of EASISTAT's commands.
It is not possible to operate the editor from a command
file (though it can be invoked from a command file) - the
editor can only be operated directly from the keyboard.
Examples:
Select command - edit
Select command - ED
Select command - ed
All the above commands will invoke the editor.
Movement keys
RIGHT ARROW - move one cell right
LEFT ARROW - move one cell left
UP ARROW - move one cell up
DOWN ARROW - move one cell down
TAB - move half page right
SHIFT-TAB - move half page left
PAGE UP - move half page up
PAGE DOWN - move half page down
HOME - move to left hand edge of screen
END - move to right hand edge of screen
CONTROL-PAGE UP - move to first row of data
CONTROL-PAGE DOWN - move to last row of data
CONTROL-HOME - move to first column of data
CONTROL-END - move to last column of data
Keys to add new row and columns
ENTER - move down a row and go to first
column of it, and if on last
row create a new row below it
CONTROL-RIGHT ARROW - if on last column, create a new
column to the right
CONTROL-N - create a new column in the middle
of the table, to the right of
the cursor
36
Data management commands
Keys to change column format
CONTROL-W - widens whole column
CONTROL-D - increases number of decimal places
displayed
Special keys
CONTROL-F - "fast" key: the cursor moves
around the data but the
screen is not redrawn, so one
can move quickly to any
location
DELETE - deletes the contents of a cell
BACKSPACE - deletes last letter typed
ESCAPE - finishes editing, returns to main
program
So to enter your data using the editor proceed as
follows:
1) When asked to "Select command -" enter ED.
2) Move the cursor to the titles row, which if the table
is empty will say NEW.
3) Type in the title which you want, using CONTROL-W to
make the column wider first if necessary.
4) Press CONTROL-RIGHT ARROW to make a new column to the
right.
5) Keep repeating steps 3 and 4 until you have enough
columns. (Note that every column must have a title, you
cannot leave the title blank.)
6) Press ENTER to go to start of the next row of values.
7) Enter the correct value in a cell.
8) Press RIGHT ARROW to move to the next column.
9) Keep repeating steps 7 and 8 until you have filled the
row.
10) Keep repeating steps 6, 7, 8 and 9 until you have
entered all the data.
11) Press ESCAPE.
12) Use the DATA SAVE command to save your data to disk
(eg enter DATA SAVE MYDATA.DAT, see the section on the
DATA command for details).
Note the following points:
If you make a mistake just go back to the cell and type
in the correct value.
If the column is too narrow for your values then use
CONTROL-W to widen it.
37
Data management commands
If you want to have more decimal places displayed in a
column then use CONTROL-D.
Although the following functions can be performed by
special EASISTAT commands, they cannot be performed from
within the editor: deleting a column or row, adding a row
in the middle of the table, narrowing a column or
reducing the number of decimal places displayed, deriving
the values in one column from another. If you wish to do
any of these things then leave the editor first (by
pressing ESCAPE) and then select the appropriate command.
You can return to the editor later by selecting the EDIT
command again.
If when you try to use CONTROL-D you find that the column
cannot be widened any further, it may be that you have
more digits than you need displayed before the decimal
point. If this is the case you will have to leave the
editor (by pressing ESCAPE) and use the FORMAT command to
adjust the number of decimal places displayed.
5:2. Titles
Format: T[itles]
This command outputs a list of the titles or labels of
the columns of the data table. This is helpful to remind
you which data is in which columns.
Examples:
Select command - t
Select command - TITLE
Select command - Ti
All these commands will have the same effect.
Example output:
NO - c1 AGE - c2 SEX - c3
MSW - c4 OCC - c5 EMP - c6
UK - c7 ADD - c8 ETH - c9
ADM - c10 PC - c11 SYM - c12
TIM - c13 PD - c14 GHQ - c15
HDA - c16 HDD - c17 CIS - c18
A - c19 B - c20 C - c21
D - c22 TOT - c23 BCD - c24
DX1 - c25 DX2 - c26 SEV - c27
PPH - c28 ASK - c29 HLP - c30
DIF - c31 REF - c32 NOT - c33
Comments
You make like to use TITLES in conjunction with the
OUTPUT PRN command to print off a list of column titles
that you can refer to while working with EASISTAT.
Don't forget that if you insert or delete columns in the
middle of your data set then that will change the values
which the column numbers refer to.
38
Data management commands
If nothing happens when you use the TITLES command, it's
probably because you haven't given EASISTAT any data yet.
5:3. List
Format: L[ist] [if condition or v[ariables] or vv]
This command lists all the rows of data, or can be
followed by an optional condition to list only some of
the rows. Alternatively the values of the general purpose
variables can be listed.
Examples:
Select command - l
Select command - LI
These will have the same effect of listing out all the rows.
Select command - l if c1<5
This will only list out those rows in which the first value is less than 5.
Select command - LIST IF ROW<5
This will only list out the first 4 rows.
Example output:
Listing data lines for which ROW<5
NO AGE SEX MSW OCC EMP UK ADD ETH ADM PC SYM TIM
1 3 1 2 2 1 1 3 1 1 0 1 2
2 1 1 1 2 2 1 2 1 2 0 1 3
3 3 1 2 3 1 1 3 1 2 0 1 2
4 1 1 1 5 2 1 2 1 2 0 1 3
Columns are listed according to the format currently
assigned to that column in terms of column width and
number of decimal places displayed. You can alter these
with the FORMAT command.
The two sets of general purpose variables can also be
listed by using the option VARIABLES, which can be
shortened down to V, or the option VV:
Select command - LIST VAR
Select command - LIST VV
The first will list V1 to V20, the second will list VV1
to VV20.
Example output:
VV1 VV2 VV3 VV4 VV5
1.0000 100.0000 1.0000 50.5000 833.2500
VV6 VV7 VV8 VV9 VV10
28.8661 2.9011 0.0000 0.0000 -1.2361
VV11 VV12 VV13 VV14 VV15
0.0000 0.0000 0.0000 0.0000 0.0000
39
Data management commands
VV16 VV17 VV18 VV19 VV20
0.0000 0.0000 0.0000 0.0000 0.0000
The LIST command can sometimes be useful to locate
certain rows in your data set. For example, suppose that
column 23 is supposed to be the total of columns 19, 20,
21 and 22, but that you think you might have entered some
of the data wrongly. Then:
Select command - l if c23!=(c19+c20+c21+c22)
will output a list of any rows for which the total is
wrong. If one column of those rows contains an
identification number (or perhaps the row number itself)
then one can make a note of which rows are incorrect and
then use EDIT to go back to them and change them.
Alternatively the LIST command might be used to locate
the row containing a particular value, for example to
locate any rows for which the value of GHQ is 23 then one
could just enter:
Select command - l if GHQ=23
5:4. New
Format: Ne[w] [column or next [label [width dec]]] or [row [number]]
Inserts a new column or row in the data table. If instead
of a numbered column the keyword NEXT is used then a new
column will be created to the right of the table. The
column number can optionally be followed by a title for
the new column, and this may optionally be followed by
the desired width and number of decimal places to be
displayed. If no title is given then the column will be
given the name NEW. If no values are given for the width
and number of decimal places then the defaults will be
used (as set by the FORMAT command).
If instead of a number after the keyword ROW the keyword
NEXT is given then a new row will be added to the bottom
of the table. The new column or row is filled with
zeroes.
Examples:
Select command - new c2
Creates a new column in the second position.
Select command - new c4 fred 6 3
Creates a new column in the fourth position, with title
FRED and width 6 characters, displayed to 3 decimal
places.
Select command - new row 4
Creates a new row in the fourth position.
40
Data management commands
Select command - ne row next
Adds a new row on the bottom of the table.
Select command - NEW
Enter column to insert (NEXT for next available): c13
If no row or column is given EASISTAT assumes you want to
insert a column and asks you what the position of it
should be, so that the above sequence would insert a new
column 13.
5:5. Delete
Format: Del[ete] [column] or [row [number]] or [all]
Deletes a row or column. All the other rows or columns
are moved up or to the left one space to fill the gap. If
the keyword ALL is given instead of a row or column then
the whole data table is deleted.
Examples:
Select command - delete c2
Deletes second column.
Select command - DEL ROW 4
Deletes fourth row.
Select command - del row
Enter row to delete: 100
If no column or row number is supplied EASISTAT will ask
for one. This would delete row 100.
5:6. Label
Format: La[bel] [column or variable [new name]]
This is the command used to change the title appearing at
the top of a column or the label of a variable.
Examples:
Select command - lab c2 Alog
Now the new title for c2 is Alog.
Select command - lab v4 theta
Now the fourth general purpose variable can be referred
to as THETA.
Select command - label
Enter column to label: c3
Current label is AGE, enter new label: fred
41
Data management commands
If the column or the new label is missing EASISTAT will
ask you to supply them. The third column, which did have
the title AGE, now has the title FRED.
The new title cannot be displayed if it is longer than
the column width. If you wish to have a longer title, you
should first increase the column width by using the
FORMAT command.
The title cannot contain spaces or non-ASCII characters
(it cannot include the pound sign). Make sure you check
the rules for valid labels at the end of the previous
chapter, otherwise the parser may get confused.
5:7. Format
Format: F[ormat] [column or default or variable [width
decimals]]
The FORMAT command allows changes in the way a column is
displayed by EDIT, LIST and when the DATA SAVE command is
used. The column width and number of decimal places
displayed can be changed. If instead of a column the
keyword DEFAULT is used, then all new columns will be
created according to the specification given. The format
of a variable can also be changed, to alter the number of
decimal places displayed.
Examples:
Select command - f c2 6 3
This changes the format of the second column so that it
is six characters wide and three decimal places be
displayed after the decimal point, e.g. numbers will be
shown as 23.319, 6.000.
Select command - FO c3
Width is 4, enter new width or blank line for unchanged:5
Decimal places currently 2, enter new value or blank line
for unchanged: 3
If the column or width and decimals parameters are
omitted from the command line EASISTAT will request them.
Select command - F V4 8 6
Variable 4 will now be displayed to six decimal places.
Select command - FO DEF 5 2
Here the keyword DEFAULT is used so that from now on
whenever the NEW command is used to insert a new column,
the column will be displayed with two decimal places and
a total width of five characters. When new data is read
in from a data file all columns will be displayed with at
least two decimal places.
42
Data management commands
Comments
It is vital to realise that when you save data to disk
you are only saving the values to the precision of the
current column format. All the time EASISTAT holds data
in its memory it treats values as floating point numbers
which are stored with a high precision. As soon as they
are saved to disk however they are treated as digits as
you see them displayed, so that if a fractional number is
displayed with no decimal places then it will be saved as
a whole number and the fractional part will be lost.
The total column width must always be at least 2 wider
than the number of decimal places, so that there is room
for the "0." part of the number, as in 0.67. The column
width must lie between 2 and 12, and the default column
width must lie between 3 and 12.
You will not be allowed to make the width too small for
either the number or the column title. If values produced
by the DERIVE command or read in by the DATA command
would be too large to fit in the column with the format
specified then the width is increased automatically.
If these two numbers occur in the same column:
125.6
12.45
then you cannot have a column width of 5 and and display
2 decimal places because the numbers would appear like
this:
125.60
12.45
so the total column width would have to be at least 6.
5:8. Derive
Format: Der[ive] [column [expression]]
This is a very powerful command which allows you to
derive the values in one column from the values in
others. One column can be set to be the total of three
others, the maximum, the product, or can be produced by
an arbitrarily complicated arithmetic and logical
expression. In addition two special options are provided
to make one column the rank of the values in another, or
to be the normalised version of those values i.e. with
the column mean subtracted and divided by the column
standard deviation.
The command can also be applied to variables and indeed
is the only way that a variable can be set to a given
value.
In the format set out above "expression" can be an
arithmetic and/or logical expression like (c3+c4)/2 or it
can be one of these two special functions of a column,
ZED or RANK:
43
Data management commands
ZED column - (column - mean)/standard deviation
RANK column - rank of the values in given the column
Examples:
Select command - der c3 c4-c5
This sets the values in column 3 to be those in column 4
minus those in column 5.
Select command - DERIVE
Enter column to derive: c3
Enter formula for new values: c4+c5
If you don't supply the column and expression on the
command line then EASISTAT will ask for them. Here is
third column becomes the sum of the fourth and fifth.
Select command - der c6 zed c5
Values in column 5 have the column mean subtracted and
are then divided by the standard deviation so that the
values in column 6 will now have mean of zero and
standard deviation of one.
Select command - der c6 rank c5
Now the values in column 6 are derived from the ranks of
those in column 5, so that the row containing the lowest
value in column 5 will have a 1 in column 6, the next
lowest a 2, and so on.
Select command - der v3 arccos(-1)+v2
The third general purpose variable is set to the value of
the second added to pi (the arccosine in radians of -1).
Select command - der v3 c4
Now the third variable takes the value from the fourth
column of the first valid row. When the DERIVE command is
applied to a column, it is applied repeatedly to that
column for every row of the data table. However when it
is applied to a variable then any references to columns
are taken to apply to the first valid row of the table.
(Valid means as selected by the NARROW and WIDEN
commands. This feature is similar to the usage of the
ARITHMETIC command, where again column references apply
to the first valid row.)
Advanced tips
Select command - der c6 (c3*c4*c5) pow (1/3)
This makes column 6 the geometric mean of the previous
three columns.
Select command - der c6 c3*(c3>c4)+c4*(c4>=c3)
Select command - der c6 c5*(c5>c6)+c6*(c6>=c5)
44
Data management commands
These two lines would make column 6 the maximum of the
previous three (if you don't see why refer back to the
section on "Combining arithmetic and logical
expressions").
Select command - der c6 ln(c5/(1-c5))
Here column 6 is derived from column 5 by the "logit"
transformation which statisticians sometimes use.
If you have one hundred rows of data the following lines
will produce a table showing the probability value for
Student's t with 5 degrees of freedom over a range from 0
to 10:
Select command - new c1
Select command - der c1 row/10
Select command - fo c1 4 1
Select command - new c2
Select command - der c2 5
Select command - new c3
Select command - fo c3 7 5
Select command - der c3 c1 pt c2
Three new columns are created. The first consists of a
series of numbers ascending from 0.1 to 10.0 (assuming
there are 100 rows) in steps of 0.1. The second column is
set to all 5's. The third is derived from the first two
using the PT function. The first column is formatted to
display one decimal place, and the third to display five.
The DERIVE command can be used to put data into classes
to make further manipulation easier. Suppose that column
3 contains values for age, then we can make column 4
contain a coding for up to 15, 16-25, 26-35, 36-45, 46-55
and 56+ like this:
Select command - der c4 1*(c3<=15) + 2*(c3>15&c3<=25)
Select command - der c4 c4 + 3*(c3>25&c3<=35)
Select command - der c4 c4 + 4*(c3>35&c3<=45)
Select command - der c4 c4 + 5*(c3>45&c3<=55) + 6*(c3>55)
Again, referring to the section on "Combining arithmetic
and logical operators" may be helpful if you do not
understand this. Note the necessity to include column 4
itself in the second, third and fourth expressions, so as
not to lose the result of the first one.
The same effect can be achieved with the following
sequence of NARROW and WIDEN commands. More commands are
needed, but it may be clearer to see what is going on:
Select command - narrow (c3<=15)
Select command - der c4 1
Select command - widen
Select command - narrow (c3>15&c3<=25)
Select command - der c4 2
Select command - widen
Select command - narrow (c3>25&c3<=35)
Select command - der c4 3
Select command - widen
Select command - narrow (c3>35&c3<=45)
45
Data management commands
Select command - der c4 4
Select command - widen
Select command - narrow (c3>45&c3<=55)
Select command - der c4 5
Select command - widen
Select command - narrow (c3>55)
Select command - der c4 6
Select command - widen
It is possible to number the data rows as follows:
Select command - new c1 rnum
Select command - derive rnum row
A new first column is created called RNUM, and is filled
with the value of each row of the table.
5:9. Sort
Format: So[rt] [d[own]] [column]
The SORT command arranges all the rows in order according
to the values in the column specified. If the keyword
DOWN (which can be abbreviated down to D) is given they
are arranged in descending order, otherwise ascending.
Examples:
Select command - SORT C3
Select command - s d AGE
Example output:
Sorted into ascending order using C3 (SEX)
Comments
Only one column can be sorted at a time. If you want to
sort rows into order according to two variables you must
first produce a dummy variable from the first two. For
example if the first column contains the month and the
second column the day of the month, you could not sort
the rows into date order by sorting first on c2 and then
on c1, since the second sort would disturb the results of
the first. Instead you would need to produce a combined
date in a third column first by following this procedure:
Select command - DERIVE C3 C1*100+C2
Select command - SORT C3
5:10. Narrow
Format: N[arrow] condition
The NARROW command narrows down the data set to include
only those rows for which the given condition applies.
All operations will subsequently only apply to the
reduced data set except the following: EDIT, TITLES, NEW,
DELETE, FORMAT. The data set remains narrowed until a
WIDEN command is issued.
46
Data management commands
Examples:
Select command - NARROW C3=1
Only the rows whose first value is 1.
Select command - n row<50
The first 49 rows.
Select command - n c3!=9 & c4!=9
The rows which do not have a 9 in either the third or
fourth column.
Example output:
Narrowing data set to include only lines for which C3=1
Comments
Note that you must supply the condition in the command
line. A NARROW command on its own will just be ignored.
The NARROW command has a similar function to the IF
option available for some commands, except that it can be
used before commands which do not have an IF option.
If a second NARROW command is used on a data set which
has already been narrowed down, then the narrowed data
set is narrowed still further, so that the only rows
remaining are those for which both conditions are true.
Note that NARROW affects all commands except those
specified above. In particular the DERIVE command will
only change values in rows which remain in the narrowed
down data set and other rows will remain unchanged. An
example of the way this can be used to code groups of
values has been shown in the section on the DERIVE
command.
This feature also provides the only way to produce an
absolute reference to individual cells in the table from
commands rather than from the data editor, by narrowing
the table down to just one row. Then one can set a given
column of that row to a certain value using the DERIVE
command, or can read a value from that column. Suppose
for example you wished to find the mean of the means of
columns 1, 2 and 3:
Select command - wid
Select command - bas c1
Select command - n row=1
Select command - der c4 xmean
Select command - wid
Select command - bas c2
Select command - n row=2
Select command - der c4 xmean
Select command - wid
Select command - bas c3
Select command - n row=3
Select command - der c4 xmean
47
Data management commands
Select command - wid
Select command - n row<=3
Select command - bas c4
The mean of c1 is put in the first row of c4, the mean of
c2 in the second and the mean of c3 in the third. Finally
the BASICS command is applied to all three values. This
example is just to show the kind of things that can be
done with EASISTAT if necessary.
EASISTAT does not have built in support for missing data
values. It is up to you to code a special value to
indicate that data is missing in some rows of a given
column. For example if the value of 9 is used for this
and you wish to perform tests on the column but to
exclude those subjects (rows) for whom data is missing,
you would use the NARROW command to use only the rows
having a value other than 9:
Select command - n c4!=9
5:11. Widen
Format: Wid[en] [condition]
The WIDEN command restores the data set after a NARROW
command. If no condition is given then all the the rows
are included. If a condition is given, then all the rows
previously included will remain, but any others that
comply with the condition will be included in addition.
Examples:
Select command - WIDEN
Includes all rows.
Select command - wid c3=4
Adds in rows for which third entry is 4.
Select command - n c2>7
...
...
Select command - wid c2<3
Now data set includes all rows whose second entry is
greater than 7 or less than 3.
Example output:
Widening data set to include all lines
Widening data set to include lines for which C2<7
Comment
In practice it is quite rarely helpful to use WIDEN with
a condition. Generally one just uses it on its own and
then uses NARROW to reduce the data set again. The rows
remaining after a NARROW and then a WIDEN command are
48
Data management commands
those for which either condition is true. Note that the
order in which the NARROW and WIDEN commands are issued
is important.
5:12. Copy
Format: Cop[y] [left, top, right, bottom, new-left,
new-top]
The COPY command copies a block of data from one part of
the data table to another, and may occasionally be of use
when it is necessary to reorganise data. It is designed
to be used if the data is not arranged correctly for a
particular analysis, for instance if the values in two
different columns should actually be in the same column
one under the other. Alternatively it may be possible to
use the COPY command to merge data sets. For example if
some new columns of values are to be added to a data
table then it may be possible to edit the data file with
a text editor and insert the new columns initially under
the original columns in new rows. Then the file can be
read in by EASISTAT, and the COPY command used to move
the new block of data to the correct position to the
right of the original columns.
Example:
Select command - copy 1,2,3,4,5,6
This copies a block from the first to third column and
the second to fourth row so that the top left corner of
the block moves to the fifth column and sixth row (the
bottom right corner would be copied to the seventh column
and eighth row).
Note that the space must already exist in the table for
the block to be copied to, no new rows or columns are
created. Additionally the destination block must not
overlap the source block. Single rows or columns can be
copied by specifying the same value for the top and
bottom row, or for the left and right column.
49