The World of Computer Software

home *** CD-ROM | disk | FTP | other *** search

/ The World of Computer Software / World_Of_Computer_Software-02-385-Vol-1of3.iso / e / estat21.zip / EASIDOC.ZIP / ESCHAP05.DOC < prev next >

Wrap

Text File | 1992-09-14 | 31KB | 926 lines

Chapter 5: DATA MANAGEMENT COMMANDS In this section we describe how to manipulate the data which EASISTAT uses. For details on how to read and write data files on disk see the section on file management commands and the DATA command. 5:1. Edit Format: Ed[it] EDIT is the command used to invoke EASISTAT's special data editor. This consists simply of a table of rows and columns into which data values are typed. Using the cursor keys you can move freely over this table and inspect, change or add values at will. Special keys are used to operate the editor in a way which does not apply to the rest of EASISTAT's commands. It is not possible to operate the editor from a command file (though it can be invoked from a command file) - the editor can only be operated directly from the keyboard. Examples: Select command - edit Select command - ED Select command - ed All the above commands will invoke the editor. Movement keys RIGHT ARROW - move one cell right LEFT ARROW - move one cell left UP ARROW - move one cell up DOWN ARROW - move one cell down TAB - move half page right SHIFT-TAB - move half page left PAGE UP - move half page up PAGE DOWN - move half page down HOME - move to left hand edge of screen END - move to right hand edge of screen CONTROL-PAGE UP - move to first row of data CONTROL-PAGE DOWN - move to last row of data CONTROL-HOME - move to first column of data CONTROL-END - move to last column of data Keys to add new row and columns ENTER - move down a row and go to first column of it, and if on last row create a new row below it CONTROL-RIGHT ARROW - if on last column, create a new column to the right CONTROL-N - create a new column in the middle of the table, to the right of the cursor 36 Data management commands Keys to change column format CONTROL-W - widens whole column CONTROL-D - increases number of decimal places displayed Special keys CONTROL-F - "fast" key: the cursor moves around the data but the screen is not redrawn, so one can move quickly to any location DELETE - deletes the contents of a cell BACKSPACE - deletes last letter typed ESCAPE - finishes editing, returns to main program So to enter your data using the editor proceed as follows: 1) When asked to "Select command -" enter ED. 2) Move the cursor to the titles row, which if the table is empty will say NEW. 3) Type in the title which you want, using CONTROL-W to make the column wider first if necessary. 4) Press CONTROL-RIGHT ARROW to make a new column to the right. 5) Keep repeating steps 3 and 4 until you have enough columns. (Note that every column must have a title, you cannot leave the title blank.) 6) Press ENTER to go to start of the next row of values. 7) Enter the correct value in a cell. 8) Press RIGHT ARROW to move to the next column. 9) Keep repeating steps 7 and 8 until you have filled the row. 10) Keep repeating steps 6, 7, 8 and 9 until you have entered all the data. 11) Press ESCAPE. 12) Use the DATA SAVE command to save your data to disk (eg enter DATA SAVE MYDATA.DAT, see the section on the DATA command for details). Note the following points: If you make a mistake just go back to the cell and type in the correct value. If the column is too narrow for your values then use CONTROL-W to widen it. 37 Data management commands If you want to have more decimal places displayed in a column then use CONTROL-D. Although the following functions can be performed by special EASISTAT commands, they cannot be performed from within the editor: deleting a column or row, adding a row in the middle of the table, narrowing a column or reducing the number of decimal places displayed, deriving the values in one column from another. If you wish to do any of these things then leave the editor first (by pressing ESCAPE) and then select the appropriate command. You can return to the editor later by selecting the EDIT command again. If when you try to use CONTROL-D you find that the column cannot be widened any further, it may be that you have more digits than you need displayed before the decimal point. If this is the case you will have to leave the editor (by pressing ESCAPE) and use the FORMAT command to adjust the number of decimal places displayed. 5:2. Titles Format: T[itles] This command outputs a list of the titles or labels of the columns of the data table. This is helpful to remind you which data is in which columns. Examples: Select command - t Select command - TITLE Select command - Ti All these commands will have the same effect. Example output: NO - c1 AGE - c2 SEX - c3 MSW - c4 OCC - c5 EMP - c6 UK - c7 ADD - c8 ETH - c9 ADM - c10 PC - c11 SYM - c12 TIM - c13 PD - c14 GHQ - c15 HDA - c16 HDD - c17 CIS - c18 A - c19 B - c20 C - c21 D - c22 TOT - c23 BCD - c24 DX1 - c25 DX2 - c26 SEV - c27 PPH - c28 ASK - c29 HLP - c30 DIF - c31 REF - c32 NOT - c33 Comments You make like to use TITLES in conjunction with the OUTPUT PRN command to print off a list of column titles that you can refer to while working with EASISTAT. Don't forget that if you insert or delete columns in the middle of your data set then that will change the values which the column numbers refer to. 38 Data management commands If nothing happens when you use the TITLES command, it's probably because you haven't given EASISTAT any data yet. 5:3. List Format: L[ist] [if condition or v[ariables] or vv] This command lists all the rows of data, or can be followed by an optional condition to list only some of the rows. Alternatively the values of the general purpose variables can be listed. Examples: Select command - l Select command - LI These will have the same effect of listing out all the rows. Select command - l if c1<5 This will only list out those rows in which the first value is less than 5. Select command - LIST IF ROW<5 This will only list out the first 4 rows. Example output: Listing data lines for which ROW<5 NO AGE SEX MSW OCC EMP UK ADD ETH ADM PC SYM TIM 1 3 1 2 2 1 1 3 1 1 0 1 2 2 1 1 1 2 2 1 2 1 2 0 1 3 3 3 1 2 3 1 1 3 1 2 0 1 2 4 1 1 1 5 2 1 2 1 2 0 1 3 Columns are listed according to the format currently assigned to that column in terms of column width and number of decimal places displayed. You can alter these with the FORMAT command. The two sets of general purpose variables can also be listed by using the option VARIABLES, which can be shortened down to V, or the option VV: Select command - LIST VAR Select command - LIST VV The first will list V1 to V20, the second will list VV1 to VV20. Example output: VV1 VV2 VV3 VV4 VV5 1.0000 100.0000 1.0000 50.5000 833.2500 VV6 VV7 VV8 VV9 VV10 28.8661 2.9011 0.0000 0.0000 -1.2361 VV11 VV12 VV13 VV14 VV15 0.0000 0.0000 0.0000 0.0000 0.0000 39 Data management commands VV16 VV17 VV18 VV19 VV20 0.0000 0.0000 0.0000 0.0000 0.0000 The LIST command can sometimes be useful to locate certain rows in your data set. For example, suppose that column 23 is supposed to be the total of columns 19, 20, 21 and 22, but that you think you might have entered some of the data wrongly. Then: Select command - l if c23!=(c19+c20+c21+c22) will output a list of any rows for which the total is wrong. If one column of those rows contains an identification number (or perhaps the row number itself) then one can make a note of which rows are incorrect and then use EDIT to go back to them and change them. Alternatively the LIST command might be used to locate the row containing a particular value, for example to locate any rows for which the value of GHQ is 23 then one could just enter: Select command - l if GHQ=23 5:4. New Format: Ne[w] [column or next [label [width dec]]] or [row [number]] Inserts a new column or row in the data table. If instead of a numbered column the keyword NEXT is used then a new column will be created to the right of the table. The column number can optionally be followed by a title for the new column, and this may optionally be followed by the desired width and number of decimal places to be displayed. If no title is given then the column will be given the name NEW. If no values are given for the width and number of decimal places then the defaults will be used (as set by the FORMAT command). If instead of a number after the keyword ROW the keyword NEXT is given then a new row will be added to the bottom of the table. The new column or row is filled with zeroes. Examples: Select command - new c2 Creates a new column in the second position. Select command - new c4 fred 6 3 Creates a new column in the fourth position, with title FRED and width 6 characters, displayed to 3 decimal places. Select command - new row 4 Creates a new row in the fourth position. 40 Data management commands Select command - ne row next Adds a new row on the bottom of the table. Select command - NEW Enter column to insert (NEXT for next available): c13 If no row or column is given EASISTAT assumes you want to insert a column and asks you what the position of it should be, so that the above sequence would insert a new column 13. 5:5. Delete Format: Del[ete] [column] or [row [number]] or [all] Deletes a row or column. All the other rows or columns are moved up or to the left one space to fill the gap. If the keyword ALL is given instead of a row or column then the whole data table is deleted. Examples: Select command - delete c2 Deletes second column. Select command - DEL ROW 4 Deletes fourth row. Select command - del row Enter row to delete: 100 If no column or row number is supplied EASISTAT will ask for one. This would delete row 100. 5:6. Label Format: La[bel] [column or variable [new name]] This is the command used to change the title appearing at the top of a column or the label of a variable. Examples: Select command - lab c2 Alog Now the new title for c2 is Alog. Select command - lab v4 theta Now the fourth general purpose variable can be referred to as THETA. Select command - label Enter column to label: c3 Current label is AGE, enter new label: fred 41 Data management commands If the column or the new label is missing EASISTAT will ask you to supply them. The third column, which did have the title AGE, now has the title FRED. The new title cannot be displayed if it is longer than the column width. If you wish to have a longer title, you should first increase the column width by using the FORMAT command. The title cannot contain spaces or non-ASCII characters (it cannot include the pound sign). Make sure you check the rules for valid labels at the end of the previous chapter, otherwise the parser may get confused. 5:7. Format Format: F[ormat] [column or default or variable [width decimals]] The FORMAT command allows changes in the way a column is displayed by EDIT, LIST and when the DATA SAVE command is used. The column width and number of decimal places displayed can be changed. If instead of a column the keyword DEFAULT is used, then all new columns will be created according to the specification given. The format of a variable can also be changed, to alter the number of decimal places displayed. Examples: Select command - f c2 6 3 This changes the format of the second column so that it is six characters wide and three decimal places be displayed after the decimal point, e.g. numbers will be shown as 23.319, 6.000. Select command - FO c3 Width is 4, enter new width or blank line for unchanged:5 Decimal places currently 2, enter new value or blank line for unchanged: 3 If the column or width and decimals parameters are omitted from the command line EASISTAT will request them. Select command - F V4 8 6 Variable 4 will now be displayed to six decimal places. Select command - FO DEF 5 2 Here the keyword DEFAULT is used so that from now on whenever the NEW command is used to insert a new column, the column will be displayed with two decimal places and a total width of five characters. When new data is read in from a data file all columns will be displayed with at least two decimal places. 42 Data management commands Comments It is vital to realise that when you save data to disk you are only saving the values to the precision of the current column format. All the time EASISTAT holds data in its memory it treats values as floating point numbers which are stored with a high precision. As soon as they are saved to disk however they are treated as digits as you see them displayed, so that if a fractional number is displayed with no decimal places then it will be saved as a whole number and the fractional part will be lost. The total column width must always be at least 2 wider than the number of decimal places, so that there is room for the "0." part of the number, as in 0.67. The column width must lie between 2 and 12, and the default column width must lie between 3 and 12. You will not be allowed to make the width too small for either the number or the column title. If values produced by the DERIVE command or read in by the DATA command would be too large to fit in the column with the format specified then the width is increased automatically. If these two numbers occur in the same column: 125.6 12.45 then you cannot have a column width of 5 and and display 2 decimal places because the numbers would appear like this: 125.60 12.45 so the total column width would have to be at least 6. 5:8. Derive Format: Der[ive] [column [expression]] This is a very powerful command which allows you to derive the values in one column from the values in others. One column can be set to be the total of three others, the maximum, the product, or can be produced by an arbitrarily complicated arithmetic and logical expression. In addition two special options are provided to make one column the rank of the values in another, or to be the normalised version of those values i.e. with the column mean subtracted and divided by the column standard deviation. The command can also be applied to variables and indeed is the only way that a variable can be set to a given value. In the format set out above "expression" can be an arithmetic and/or logical expression like (c3+c4)/2 or it can be one of these two special functions of a column, ZED or RANK: 43 Data management commands ZED column - (column - mean)/standard deviation RANK column - rank of the values in given the column Examples: Select command - der c3 c4-c5 This sets the values in column 3 to be those in column 4 minus those in column 5. Select command - DERIVE Enter column to derive: c3 Enter formula for new values: c4+c5 If you don't supply the column and expression on the command line then EASISTAT will ask for them. Here is third column becomes the sum of the fourth and fifth. Select command - der c6 zed c5 Values in column 5 have the column mean subtracted and are then divided by the standard deviation so that the values in column 6 will now have mean of zero and standard deviation of one. Select command - der c6 rank c5 Now the values in column 6 are derived from the ranks of those in column 5, so that the row containing the lowest value in column 5 will have a 1 in column 6, the next lowest a 2, and so on. Select command - der v3 arccos(-1)+v2 The third general purpose variable is set to the value of the second added to pi (the arccosine in radians of -1). Select command - der v3 c4 Now the third variable takes the value from the fourth column of the first valid row. When the DERIVE command is applied to a column, it is applied repeatedly to that column for every row of the data table. However when it is applied to a variable then any references to columns are taken to apply to the first valid row of the table. (Valid means as selected by the NARROW and WIDEN commands. This feature is similar to the usage of the ARITHMETIC command, where again column references apply to the first valid row.) Advanced tips Select command - der c6 (c3*c4*c5) pow (1/3) This makes column 6 the geometric mean of the previous three columns. Select command - der c6 c3*(c3>c4)+c4*(c4>=c3) Select command - der c6 c5*(c5>c6)+c6*(c6>=c5) 44 Data management commands These two lines would make column 6 the maximum of the previous three (if you don't see why refer back to the section on "Combining arithmetic and logical expressions"). Select command - der c6 ln(c5/(1-c5)) Here column 6 is derived from column 5 by the "logit" transformation which statisticians sometimes use. If you have one hundred rows of data the following lines will produce a table showing the probability value for Student's t with 5 degrees of freedom over a range from 0 to 10: Select command - new c1 Select command - der c1 row/10 Select command - fo c1 4 1 Select command - new c2 Select command - der c2 5 Select command - new c3 Select command - fo c3 7 5 Select command - der c3 c1 pt c2 Three new columns are created. The first consists of a series of numbers ascending from 0.1 to 10.0 (assuming there are 100 rows) in steps of 0.1. The second column is set to all 5's. The third is derived from the first two using the PT function. The first column is formatted to display one decimal place, and the third to display five. The DERIVE command can be used to put data into classes to make further manipulation easier. Suppose that column 3 contains values for age, then we can make column 4 contain a coding for up to 15, 16-25, 26-35, 36-45, 46-55 and 56+ like this: Select command - der c4 1*(c3<=15) + 2*(c3>15&c3<=25) Select command - der c4 c4 + 3*(c3>25&c3<=35) Select command - der c4 c4 + 4*(c3>35&c3<=45) Select command - der c4 c4 + 5*(c3>45&c3<=55) + 6*(c3>55) Again, referring to the section on "Combining arithmetic and logical operators" may be helpful if you do not understand this. Note the necessity to include column 4 itself in the second, third and fourth expressions, so as not to lose the result of the first one. The same effect can be achieved with the following sequence of NARROW and WIDEN commands. More commands are needed, but it may be clearer to see what is going on: Select command - narrow (c3<=15) Select command - der c4 1 Select command - widen Select command - narrow (c3>15&c3<=25) Select command - der c4 2 Select command - widen Select command - narrow (c3>25&c3<=35) Select command - der c4 3 Select command - widen Select command - narrow (c3>35&c3<=45) 45 Data management commands Select command - der c4 4 Select command - widen Select command - narrow (c3>45&c3<=55) Select command - der c4 5 Select command - widen Select command - narrow (c3>55) Select command - der c4 6 Select command - widen It is possible to number the data rows as follows: Select command - new c1 rnum Select command - derive rnum row A new first column is created called RNUM, and is filled with the value of each row of the table. 5:9. Sort Format: So[rt] [d[own]] [column] The SORT command arranges all the rows in order according to the values in the column specified. If the keyword DOWN (which can be abbreviated down to D) is given they are arranged in descending order, otherwise ascending. Examples: Select command - SORT C3 Select command - s d AGE Example output: Sorted into ascending order using C3 (SEX) Comments Only one column can be sorted at a time. If you want to sort rows into order according to two variables you must first produce a dummy variable from the first two. For example if the first column contains the month and the second column the day of the month, you could not sort the rows into date order by sorting first on c2 and then on c1, since the second sort would disturb the results of the first. Instead you would need to produce a combined date in a third column first by following this procedure: Select command - DERIVE C3 C1*100+C2 Select command - SORT C3 5:10. Narrow Format: N[arrow] condition The NARROW command narrows down the data set to include only those rows for which the given condition applies. All operations will subsequently only apply to the reduced data set except the following: EDIT, TITLES, NEW, DELETE, FORMAT. The data set remains narrowed until a WIDEN command is issued. 46 Data management commands Examples: Select command - NARROW C3=1 Only the rows whose first value is 1. Select command - n row<50 The first 49 rows. Select command - n c3!=9 & c4!=9 The rows which do not have a 9 in either the third or fourth column. Example output: Narrowing data set to include only lines for which C3=1 Comments Note that you must supply the condition in the command line. A NARROW command on its own will just be ignored. The NARROW command has a similar function to the IF option available for some commands, except that it can be used before commands which do not have an IF option. If a second NARROW command is used on a data set which has already been narrowed down, then the narrowed data set is narrowed still further, so that the only rows remaining are those for which both conditions are true. Note that NARROW affects all commands except those specified above. In particular the DERIVE command will only change values in rows which remain in the narrowed down data set and other rows will remain unchanged. An example of the way this can be used to code groups of values has been shown in the section on the DERIVE command. This feature also provides the only way to produce an absolute reference to individual cells in the table from commands rather than from the data editor, by narrowing the table down to just one row. Then one can set a given column of that row to a certain value using the DERIVE command, or can read a value from that column. Suppose for example you wished to find the mean of the means of columns 1, 2 and 3: Select command - wid Select command - bas c1 Select command - n row=1 Select command - der c4 xmean Select command - wid Select command - bas c2 Select command - n row=2 Select command - der c4 xmean Select command - wid Select command - bas c3 Select command - n row=3 Select command - der c4 xmean 47 Data management commands Select command - wid Select command - n row<=3 Select command - bas c4 The mean of c1 is put in the first row of c4, the mean of c2 in the second and the mean of c3 in the third. Finally the BASICS command is applied to all three values. This example is just to show the kind of things that can be done with EASISTAT if necessary. EASISTAT does not have built in support for missing data values. It is up to you to code a special value to indicate that data is missing in some rows of a given column. For example if the value of 9 is used for this and you wish to perform tests on the column but to exclude those subjects (rows) for whom data is missing, you would use the NARROW command to use only the rows having a value other than 9: Select command - n c4!=9 5:11. Widen Format: Wid[en] [condition] The WIDEN command restores the data set after a NARROW command. If no condition is given then all the the rows are included. If a condition is given, then all the rows previously included will remain, but any others that comply with the condition will be included in addition. Examples: Select command - WIDEN Includes all rows. Select command - wid c3=4 Adds in rows for which third entry is 4. Select command - n c2>7 ... ... Select command - wid c2<3 Now data set includes all rows whose second entry is greater than 7 or less than 3. Example output: Widening data set to include all lines Widening data set to include lines for which C2<7 Comment In practice it is quite rarely helpful to use WIDEN with a condition. Generally one just uses it on its own and then uses NARROW to reduce the data set again. The rows remaining after a NARROW and then a WIDEN command are 48 Data management commands those for which either condition is true. Note that the order in which the NARROW and WIDEN commands are issued is important. 5:12. Copy Format: Cop[y] [left, top, right, bottom, new-left, new-top] The COPY command copies a block of data from one part of the data table to another, and may occasionally be of use when it is necessary to reorganise data. It is designed to be used if the data is not arranged correctly for a particular analysis, for instance if the values in two different columns should actually be in the same column one under the other. Alternatively it may be possible to use the COPY command to merge data sets. For example if some new columns of values are to be added to a data table then it may be possible to edit the data file with a text editor and insert the new columns initially under the original columns in new rows. Then the file can be read in by EASISTAT, and the COPY command used to move the new block of data to the correct position to the right of the original columns. Example: Select command - copy 1,2,3,4,5,6 This copies a block from the first to third column and the second to fourth row so that the top left corner of the block moves to the fifth column and sixth row (the bottom right corner would be copied to the seventh column and eighth row). Note that the space must already exist in the table for the block to be copied to, no new rows or columns are created. Additionally the destination block must not overlap the source block. Single rows or columns can be copied by specifying the same value for the top and bottom row, or for the left and right column. 49