home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The World of Computer Software
/
World_Of_Computer_Software-02-386-Vol-2of3.iso
/
c
/
codebk11.zip
/
CODEBOOK.HST
< prev
next >
Wrap
Text File
|
1990-12-21
|
11KB
|
165 lines
History of CODEBOOK.BAS by Jim Groeneveld:
vs. 0.0a, 19 October 1988: initial version, called UNFORMAT.BAS.
vs. 0.0b, 20 October 1988:
- added question for (missing) value to replace entirely blank
fields; this value may be alphanumeric, if it contains
blanks, comma's, etc. it should be surrounded by double
quotes; originally this value was fixed to -1.
- removed question for record length of input database file;
record length is now determined by the program and
only used to determine the end of a record.
Because it is determined for each record separately
records may actually be of variable length, though
should be sufficient to read specific columns from.
Space is reserved for a record length of maximally 1274.
- improved feedback on screen while processing cases.
- added question for (max.) number of variables per output file.
vs. 0.0c, 26 October 1988:
- added possibility to read comment lines NOT starting with a SPACE
- improved error report on field widths not matching columns:
added reporting (STATGRAPHICS) variable name.
vs. 0.0d, 27 October 1988:
- added check for database file name with numerical extension,
which is reserved for output file names.
- added default values for "missing" value and number of variables.
- added check for existing output files and question whether to
overwrite them or not.
vs. 0.0e, 1 November 1988:
- corrected possible misinterpretations while 'reading' past EOL
and report of such occurrences.
- added check for unequal record lengths and appropriate report.
- added checks for illegal field widths and columns and report.
vs. 0.0f, 8 November 1988:
- added question for number of variables in order to reserve space
up to a number of 32767.
- added question for maximum record length in order to reserve
space and check for exceeding of this maximum. This maximum
may be any number up to 32767*255-1=8355584.
vs. 0.0g, 9 November 1988:
- added optional automatic adaptation of maximum record length
to actual maximum record length up to 32767*255-1=8355584.
The actual maximum record length (to determine the number
of data lines per record) is determined from the columns
to be read from the codebook file as well as during
reading the actual records.
- added optional automatic adaptation of maximum number of
variables to the actual number of variables up to a
maximum of 32767. This number is deduced from the codebook
file and is being updated during the run every time it is
necessary by increasing it by 10. This process, however,
slows down execution time significantly with more than 100
variables.
- removed report of record length of first case.
- added report of minimum and maximum record lengths read.
vs. 0.1, 13 July 1989:
- changed original program name UNFORMAT.BAS into CODEBOOK.BAS.
- changed increment of 10 with auto-adapt to actual number of variables
into 100 (may be varied by changing a constant in the program source).
- corrected ability to use lengths and columns > 32767 up to 8355584 by
changing certain appropriate integer variables into single precision
variables. (Actually values up to 32767 were possible before.)
- corrected ability to use specific counts > 32767 up to almost infinite by
changing certain appropriate integer variables into double precision
variables. (Actually values up to 32767 were possible before.)
- added optional removing of leading and trailing spaces of field values.
- added choice between BLANK or COMMA delimited output file(s).
- removed limit of 64 variables per output file (limit now is 32767).
- changed default of 10 variables per output file into 58 (for STATGRAPHICS).
vs. 0.2, 17 July 1989:
- added optional check for (case sensitive) identical variable names.
- added enclosing within single or double quotes of character values from
character variables with a single or double quote in the first column of
the corresponding description lines within the codebook file;
for use with values containing characters like spaces, comma's and quotes;
embedded quotes are doubled, but may not always be readable as such by
application programs, this is for the user's own concern.
With this feature all possible character values may be converted now.
- some improved (more specific) error reports.
vs. 0.3, 24 July 1989:
- added check for number of output files. Because that number will be the
extension of the output file, it may not exceed 999. It is calculated
from the total number of variables in the codebook file and the user
specified (maximum) number of variables per output file. If the number is
larger than 999 a minimum number of variables per output file will be
calculated and displayed.
- added warnings for time consuming garbage collection and auto-adaptation.
vs. 0.4, 25 July 1989:
- added default responses for all possible prompts and changed some prompts.
- removed prompt for maximum record length. Maximum record length now is set
initially at a minimum value of 254 (MAX.LINE.INPUT.LENGTH-1) and is
adapted to the actual necessary length automatically deduced from the
codebook file. This length now only specifies the maximum column number
to interpret. Input records may now be of an 'infinite' length. The
remaining part of each record is processed, but not interpreted.
Additionally some single precision variables have necessarily been changed
into double precision variables.
- changed increment for automatic adaptation to actual number of variables
from 100 to the initial (negative, user specified) number of variables.
- added adding of spaces to values from incomplete fields (reading past EOL),
eventually being replaced by the missing value(s).
vs. 1.0, 26 July 1989:
- added possibility of specifying a global missing value consisting of one
or more spaces.
- removed limit of 10 character variable names, limit now is 255!
- added an additional output file type: FIXED formatted (next to BLANK and
COMMA delimited) in which all values, the missing value and variable name
of one variable have the same output field width (eventually truncated
from the left or right justified). All fields are contiguous. This offers
the possibility to extract values of a limited set of variables from an
original fixed formatted database file into another fixed formatted file.
The quote specification (the first column in the codebook file) is ignored.
- added another additional output file type: Report, as some special case of
a Fixed formatted file, but with additional empty columns (1..9) between
the fields. These 'empty' columns are eventually used to fit the variable
name or missing value in, which is longer than the actual field width.
Additionally added prompt for page length in lines, default 60.
- made placement of a header line with variable names in the output files
optional, default present with BLANK and COMMA delimited and Report output
files and not present with FIXED formatted output files.
- completely redesigned and rewritten algorithm for file name checking,
which wasn't correct for subdirectory names; improved error report.
vs. 1.1, 21 December 1990:
- added additional check of legal field width based on starting and ending
columns of the field if the FIELD WIDTH isn't explicitely specified.
- corrected occasional, but severe bug when writing fixed formatted data.
- improved quality & quantity of the contents of the TESTDATA example files.
- changed filename checking to allow for extended characters in path/filename
specification (OK for DOS).
- added support for wildcard characters within filenames or empty filenames
which implicitely requests a directory listing of files.
- without changing version number and date replaced author info (below) later
Possible future features (if necessary enough):
+ inclusion of optional output of automatic CaseNumbers as the first variable
of each output file.
+ addition of optional additional (second) line (record) with missing values for
each variable, though I don't know of any programs using this info.
+ specification of maximum output record length instead of number of variables
per output file (as a negative value, default -640). For each output file the
maximum number of variables that will fit within this length will be
calculated from the maximum per variable of the actual record length, the
delimiter length, the lengths of the variable name and missing value and the
length of the extra spaces in Report output files. Requires many extra
calculations or extra array space remembering either the maximum field width
for each variable as described above or the number of variables in each of the
max. 999 output files. (The maximum field width may then also be used to
improve the automation of the generation of REPORT type output files.)
In this instance right justification might also be included.
+ specification of delimiting character in REPORT type output files (space,|).
+ specification of number of extra delimiting spaces within Report type output
files per variable in the codebook file (requires additional large array).
+ inclusion of optional page numbers, date and time per page of Report output.
+ counting the number (and calculating the fraction) of missing values
(contiguous spaces) for each variable.
+ recoding facilities other than for only blank fields (would require many
extra arrays that take up valuable memory space).
+ input of blank or comma delimited (or even report type) data files.
Centrum voor Medische Informatica TNO <Email> | | |\/|
TNO Center for Medical Informatics | GROENEVELD@CMI.TNO.NL | \_/ | | |
( CMI-TNO ) | Y. Groeneveld | GROENEVELD@CMIHP1.UUCP | Jim Groeneveld
P.O.Box 124 | Wassenaarseweg 56 | GROENEVELD@TNO.NL | Schoolweg 14
2300 AC Leiden | 2333 AL Leiden | ...@HDETNO51.BITNET | 8071 BC Nunspeet
Nederland. | (+31|0)71-181810 | Fax (+31|0)71-176382 | 03412-60413