home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
C!T ROM 5
/
ctrom5b.zip
/
ctrom5b
/
DOS
/
DATABASE
/
CONVR410
/
CONVERT.DOC
< prev
next >
Wrap
Text File
|
1994-10-15
|
22KB
|
469 lines
CONVERT.DOC
10/15/94
The CONVERT.EXE program converts data between several basic data formats.
Features:
* You can read data in from any of the following data formats:
ASCII-delimited
fixed field
dBase-compatible
* You can write data out in any of the following data formats:
ASCII-delimited
fixed field
dBase-compatible
WKS (Lotus 1-2-3 release 1 compatible)
* For ASCII-delimited files, you can specify the delimiters used between
fields as well as around numeric or character data
* For dBase input files, you can retain deleted records if you want
* You can resize variables or drop them entirely if desired
* You can specify up to 10 include filters; all records processed must meet
at least one of these filter conditions
* You can specify up to 10 exclude filters; any records which meet these
filter conditions are dropped
* For WKS output files, you can process input files bigger than Lotus
1-2-3 itself can handle (1-2-3 limits input records to being 240 characters
or less
Program written by:
Bruce Guthrie
Room H-4885
U.S. Dept of Commerce/ESA/OBA/BSISD
Washington, D.C. 20230
(202) 482-3234
You may freely copy and re-distribute this program; however, the U.S.
Department of Commerce neither guarantees nor assures compatibility of the
program with all computer software or hardware.
Foreign users: Please provide an Internet e-mail address in all correspondence
or and just e-mail your problems to me at bgu@cu.nih.gov
Data format types:
An ASCII-delimited file is one which typically has double quotes around each
character field (the quotes are optional in CONVERT) and typically commas
between fields. Leading and trailing spaces are removed from character as well
as all other values.
A fixed-field file places each field in the same column positions on each
record. The lengths of the fields are the same from record to record.
As an example, these might be an ASCII-delimited records:
"Economic Bulletin Board","202 482-3870",35
"EBB High-Speed","202 482-2584",100
Fixed-field file records might look like this for the same data:
Economic Bulletin Board 202 482-3870 35
EBB High-Speed 202 482-2584 100
WKS files are supported directly by Lotus 1-2-3 (all versions) as well as most
other spreadsheet programs. DBF files are supported by dBaseIII, dBaseIII+,
dBaseIV, and most other data base management programs (Paradox etc).
Field-definition file:
Unless you are reading a dBase file, this program requires a field-definition
file to figure out the characteristics for each field and also to set certain
file characteristics. If you're processing an ASCII-delimited input file, the
routine can try to create a field-definition file for you if desired.
The field-definition file can be created with any text editor. The
field-definition file consists of several records with the following fields
separated by spaces. Except for the record type indicator (which must begin in
column 1), all other fields can be placed in any columns:
(1) record type (see below)
(2) length of field on input
(3) number of decimal places for numeric data (if you don't know,
put a "?" here; for non-numeric data, a "0" is fine) on output
(4) length of field on output
Any characters after the field length are treated as comment fields. You would
typically use this to enter the field name or column position or any other
information of use to you.
The data record types accepted by this routine are as follows:
type C = character data
N = numeric
L = logical (T or F)
D = date (in yyyymmdd format)
The data fields should be in the order the fields are found in the source file.
You may also specify up to 10 include filters and up to 10 exclude filters in
the field-definition file. If an include filter is specified, the input record
must contain at least one of the specified character strings in order to be
processed. If an exclude filter is specified, any input record which contains
any of the specified character strings will be ignored. Filters are case
sensitive (capitalization matters) and processed as "or" items (if any filter is
met, the condition is met; filters are not combined to determine fulfillment).
The filters are specified in the field-definition file in the following ways:
/+=string include filter, the string "string" can appear anywhere
/S+=string include filter, the string "string" is at the beginning
of the record
/+S=string include filter, the string "string" is at the end of the
record
/-=string exclude filter, the string "string" can appear anywhere
/S-=string exclude filter, the string "string" is at the beginning
of the record
/-S=string exclude filter, the string "string" is at the end of the
record
For example, if you want to specify in your control file that you only want
records that contain either "Japan" or "France" *and* you want to exclude any
records that begin with an underscore character, you would need to include the
following three filter statements:
/+=Japan
/+=France
/S-=_
The character string can include hexadecimal codes (in the &Hxx format) or
decimal codes (in the \ddd format) if necessary.
Note that for fixed field files, you have to account for every byte in the file.
If you have something like this:
12345678_1_2345678_2_2345678_3 (column positions)
APPLE X Y 12 BANANAS
Even though you may think you only have five fields, the following .DEF file
will NOT work:
; Bad .DEF file: Note does not account for blank spaces
C 8 0 8 Fruit1
C 1 0 1 Class1
C 1 0 1 Class2
N 5 0 5 Value
C 11 0 11 Fruit2
You may want the Fruit1 field to be in columns 1 through 8 and Class1 to be in
column 10 but the routine will not know to skip column 9 so it will start
reading Class1 beginning in column 9, Class2 beginning in column 10, etc. To
drop the blank positions, you have to add dummy fields on input and ask for them
to be dropped on output:
; Good .DEF file: Spaces between fields are accounted for
C 8 0 8 Fruit1
C 1 0 0 Filler
C 1 0 1 Class1
C 1 0 0 Filler
C 1 0 1 Class2
C 1 0 0 Filler
N 5 0 5 Value
C 1 0 0 Filler
C 11 0 11 Fruit2
You can also use the input field length and output field lengths to either drop
fields using other formats (by specifying a zero length for the output field
length) or for creating fields on output (by specifying a zero length for the
input field length). You can also use this to expand on contract a field. For
example, if Fruit1 is 8 characters long but you only want it to occupy 4
characters on output (thus the field would be truncated), specify 8 for the
input field length and 4 for the output field length.
You can have the routine create the field-definition file it's using for dBase
and ASCII-delimited files. This is controlled by the /OUTDEF=deffile and
/-OUTDEF parameters.
The CONVERT.INI file:
CONVERT will read a CONVERT.INI file if one is found. (You can specify a
different file name if desired.) The file is an ASCII text file that can be
created maintained by hand. The file can consist or one or more command line
parameters (only those that begin with a "/"), one statement per line.
The file can also contain comments which are blank lines or any line beginning
with:
; (semi-colon)
: (colon)
' (quote)
CONVERT looks for the initialization file in your default subdirectory first.
It then searches for it in the subdirectory where the executable was and then
goes through your DOS path.
Passing in "/-I" or "/INULL" skips loading the CONVERT.INI file.
CONFIGWS.EXE:
In addition to using the CONVERT.INI file, you can permanently change some
defaults within CONVERT.EXE by using the CONFIGWS.EXE program. CONFIGWS.EXE is
not included in the CONVERT distribution package but previous versions can
typically be used with CONVERT if desired. (The same CONFIGWS.EXE program can
be used to path a number of Wayne Software programs.) If CONFIGWS can't patch
the executable, it will notify you of this before altering anything.
CONFIGWS.EXE allows you to set the following defaults:
Output format: /TO FIXED, /TO ASCII, /TO WKS, or /TO DBF
If bad data: /SKIP, /MISSING, or /ABORT
Noise: /BEEP or /-BEEP
Output exists: /OVERASK, /OVERWRITE, /-OVERWRITE, /APPEND
When setting options, the program will process defaults in the following order:
(1) CONFIGWS.EXE-settable settings
(2) CONVERT.INI settings
(3) command-line settings
The *last* settings encountered win.
Syntax:
CONVERT infile [ outfile [ deffile ] ]
[ /INDEF=deffile | /-INDEF ] [ /OUTDEF=deffile | /-OUTDEF ]
[ /OVERWRITE | /-OVERWRITE | /APPEND | /OVERASK ] [ /Iinitfile | /-I ]
[ /FROM FIXED | /FROM ASCII | /FROM DBF ] [ /DELETED | /-DELETED ]
[ /TO FIXED | /TO ASCII | /TO WKS | /TO DBF ] [ /HEADER | /-HEADER ]
[ /DELIMS=aroundstrings,aroundnums,betweenfields ] [ /BEEP | /-BEEP ]
[ /INMISS=val ] [ /INMISSC=val ] [ /OUTMISS=val ] [ /OUTMISSC=val ]
[ /SKIP | /MISSING | /ABORT ] [ /FIRSTOBS=n ] [ /LASTOBS=n ]
[ /NULLS | /-NULLS ] [ /Q ] [ /? | /?&H ]
"infile" is the file specification for the ASCII-delimited or fixed-field file
you want converted. You can specify a drive and path specification if
necessary. This parameter is required.
"outfile" is the file specification of the file you want to create. You can
specify a drive and path specification if necessary. If no outfile is provided,
the routine will presume you want the output file called the same thing as the
infile but you want the extension to be ".FIX" (if the output file is a
fixed-field file), ".PRN" (if the output file is ASCII-delimited), or ".WKS" (if
the output file is to be in a WKS format).
"deffile" is the file specification for the input field-definition file. You
can specify a drive and path specification if necessary. If no deffile is
specifically provided, the routine will presume it is called the same thing as
the infile but it has the extension of ".DEF". Note that the deffile can only
be provided if you also specify the outfile name.
"/INDEF=deffile" provides the name of the field-definition file to be read by
the program. If no deffile is specifically provided and you're using FROM
ASCII, the routine will presume the field-definition file exists and is named
the same thing as the infile but it has the extension of ".DEF". A deffile has
to be specifically provided for FROM FIXED files if one is desired.
"/-INDEF" says there is no field-definition file. This is the default if you're
using FROM DBF or FROM FIXED.
"/OUTDEF=deffile" provides the name of the output file that you want the program
to write the field-definition file to. This is useful in cases where you did
*not* use a field-definition file on input since it allows you to see and
possibly modify the field-definition file for next time. If no deffile is
provided and you're using FROM DBF, the routine will presume the
field-definition file is named the same thing as the infile but it has an
extension of ".DEF".
"/-OUTDEF" says to skip the creation of the field-definition file. This is the
default for ASCII-delimited and fixed-field files.
"/OVERWRITE" says to overwrite the output file if it exists already.
"/-OVERWRITE" says to abort if the output file exists already.
"/APPEND" says to append (add) to the output file if it exists already. This
option is only available if you're creating either a fixed-field or ASCII-
delimited output file.
"/OVERASK" says to ask if the output file exists already. This is initially
the default.
"/Iinitfile" says to read an initialization file with the file name "initfile".
The file specification *must* contain a period. If no drive or path information
is specified, the program will search for initfile beginning in your default
subdirectory and then going throughout your DOS path. The use of an
initialization file is optional. Initially defaults to "/ICONVERT.INI".
"/-I" (or "/INULL") says to skip loading the initialization file.
"/FROM FIXED", "/FROM ASCII", and "/FROM DBF" specifies the format for the input
file. The routine will typically read the file and guess its format for you.
You can also specify a default by using the CONFIGWS.EXE program.
"/DELETED" applies to dBase input files only. It says you want to retain
records tagged as "deleted". Otherwise, they're dropped in the output file.
"/-DELETED" applies to dBase input files only. It says to drop records tagged
as "deleted". This is initially the default.
"/TO FIXED", "/TO ASCII", "/TO WKS", and "/TO DBF" tells the routine what sort
of output file you'd like to create. Initially defaults to "/TO WKS".
"/HEADER" is used in conjunction with WKS output files. If /HEADER is in
effect, the first row of the spreadsheet will contain the variable name for the
cell as provided in your control file. If none are provided, the field names
will be FIELD_01 onward. /-HEADER turns this off and is the default. Note that
the header line (if any) will show up in the output counts.
"/DELIMS=aroundstrings,aroundnums,betweenfields" allows you to specify the
delimiters (in sequence) around string fields, around numeric fields, and
between fields. Defaults to:
/DELIMS=",,,
(Use quotes around character strings, nothing around numeric data, and the third
comma indicates that there is a comma between fields.) The replacement string
can include hexadecimal codes (in the &Hxx format) or decimal codes (in the \ddd
format) if necessary so either of the following would put a tab between fields:
/DELIMS=",,&H09
/DELIMS=",,\009
See the table of hexadecimal and decimal codes at the end of this documentation.
"/BEEP" beeps when the program is finished.
"/-BEEP" reverses /BEEP. Initially the default.
"/INMISS=val" specifies that any numeric value that has the character string
representation of "val" will be considered missing. Note that this is an exact
character string comparison so /INMISS=1 will not compare to a value of "1.00".
Defaults to /INMISS=NULL (which translates as spaces).
"/INMISSC=val" specifies that any character string value that has the value of
"val" will be considered missing. Defaults to /INMISSC=NULL (which translates
as spaces).
"/OUTMISS=val" specifies that any missing numeric input value will be translated
on output as "val". For example, "/OUTMISS=N.A." would fill in "N.A." for each
missing value. Defaults to /OUTMISS=NULL (which translates as spaces).
"/OUTMISSC=val" specifies that any missing character input value will be
translated on output as "val". Defaults to /OUTMISSC=NULL (which translates as
spaces).
"/SKIP" says to skip records with bad data values; otherwise the routine
aborts when it runs into any. /SKIP, /MISSING, and /ABORT are mutually
exclusive.
"/MISSING" says to presume any missing fields in an ASCII-delimited record
should be filled in with blanks (for character fields) and 0 for numeric fields.
Incomplete records are written out (unlike in /SKIP). /SKIP, /MISSING, and
/ABORT are mutually exclusive. Note that the program will only print out the
first "bad" record. There may be others that show up after this one.
"/ABORT" says to abort when you run into bad records. Initially the default.
/SKIP, /MISSING, and /ABORT are mutually exclusive.
"/FIRSTOBS=n" says to start reading the data beginning with record number n.
Initially defaults to "/FIRSTOBS=1".
"/LASTOBS=n" says to stop reading the data after record number n. Initially
defaults to "/LASTOBS=2000000000" (2 billion).
"/NULLS" allows fields that begin with decimal 0 to be left in the output file
as valid values. Otherwise, they're treated as being missing. /NULLS is
initially the default.
"/-NULLS" translates any field which begins with the decimal 0 value as
missing. For character fields, it's translated to the value of INMISSC, numeric
fields are switched to the INMISS value. /NULLS is initially the default.
"/Q" turns off the line-by-line status report.
"/?" or "/HELP" or "HELP" shows you the syntax for the command.
"/?&H" gives you a hexadecimal and decimal conversion table.
Return codes:
CONVERT returns the following ERRORLEVEL codes:
0 = no problems, file converted
254 = problems for formats of data
255 = syntax problems, file(s) not found, or /? requested
Restrictions and Caveats:
The program cannot handle dBase files with Memo fields.
Most spreadsheet programs restrict a given field length to being 240 characters
or less.
Date fields which are in the form "yy-mm-dd" (instead of "yyyymmdd") should be
declared as character fields instead of date fields.
Date fields converted for WKS files are changed into strings of the "yy-mm-dd"
format.
Decimal and hexadecimal codes:
e.g. "\066\097\116" and "&H426174" both are "Bat"
+---------------------------------------------------------------------------
| dec hex chr | dec hex chr | dec hex chr | dec hex chr | dec hex chr |
+--------------+--------------+--------------+--------------+--------------+
| \000 &H00 nul| \052 &H34 4 | \104 &H68 h | \156 &H9C £ | \208 &HD0 ╨ |
| \001 &H01 | \053 &H35 5 | \105 &H69 i | \157 &H9D ¥ | \209 &HD1 ╤ |
| \002 &H02 | \054 &H36 6 | \106 &H6A j | \158 &H9E ₧ | \210 &HD2 ╥ |
| \003 &H03 | \055 &H37 7 | \107 &H6B k | \159 &H9F ƒ | \211 &HD3 ╙ |
| \004 &H04 | \056 &H38 8 | \108 &H6C l | \160 &HA0 á | \212 &HD4 ╘ |
| \005 &H05 | \057 &H39 9 | \109 &H6D m | \161 &HA1 í | \213 &HD5 ╒ |
| \006 &H06 | \058 &H3A : | \110 &H6E n | \162 &HA2 ó | \214 &HD6 ╓ |
| \007 &H07 bel| \059 &H3B ; | \111 &H6F o | \163 &HA3 ú | \215 &HD7 ╫ |
| \008 &H08 bs | \060 &H3C < | \112 &H70 p | \164 &HA4 ñ | \216 &HD8 ╪ |
| \009 &H09 tab| \061 &H3D = | \113 &H71 q | \165 &HA5 Ñ | \217 &HD9 ┘ |
| \010 &H0A lf | \062 &H3E > | \114 &H72 r | \166 &HA6 ª | \218 &HDA ┌ |
| \011 &H0B vt | \063 &H3F ? | \115 &H73 s | \167 &HA7 º | \219 &HDB █ |
| \012 &H0C pg | \064 &H40 @ | \116 &H74 t | \168 &HA8 ¿ | \220 &HDC ▄ |
| \013 &H0D cr | \065 &H41 A | \117 &H75 u | \169 &HA9 ⌐ | \221 &HDD ▌ |
| \014 &H0E | \066 &H42 B | \118 &H76 v | \170 &HAA ¬ | \222 &HDE ▐ |
| \015 &H0F | \067 &H43 C | \119 &H77 w | \171 &HAB ½ | \223 &HDF ▀ |
| \016 &H10 | \068 &H44 D | \120 &H78 x | \172 &HAC ¼ | \224 &HE0 α |
| \017 &H11 | \069 &H45 E | \121 &H79 y | \173 &HAD ¡ | \225 &HE1 ß |
| \018 &H12 | \070 &H46 F | \122 &H7A z | \174 &HAE « | \226 &HE2 Γ |
| \019 &H13 | \071 &H47 G | \123 &H7B { | \175 &HAF » | \227 &HE3 π |
| \020 &H14 | \072 &H48 H | \124 &H7C | | \176 &HB0 ░ | \228 &HE4 Σ |
| \021 &H15 | \073 &H49 I | \125 &H7D } | \177 &HB1 ▒ | \229 &HE5 σ |
| \022 &H16 | \074 &H4A J | \126 &H7E ~ | \178 &HB2 ▓ | \230 &HE6 µ |
| \023 &H17 | \075 &H4B K | \127 &H7F | \179 &HB3 │ | \231 &HE7 τ |
| \024 &H18 | \076 &H4C L | \128 &H80 Ç | \180 &HB4 ┤ | \232 &HE8 Φ |
| \025 &H19 | \077 &H4D M | \129 &H81 ü | \181 &HB5 ╡ | \233 &HE9 Θ |
| \026 &H1A eof| \078 &H4E N | \130 &H82 é | \182 &HB6 ╢ | \234 &HEA Ω |
| \027 &H1B esc| \079 &H4F O | \131 &H83 â | \183 &HB7 ╖ | \235 &HEB δ |
| \028 &H1C | \080 &H50 P | \132 &H84 ä | \184 &HB8 ╕ | \236 &HEC ∞ |
| \029 &H1D ???| \081 &H51 Q | \133 &H85 à | \185 &HB9 ╣ | \237 &HED φ |
| \030 &H1E ???| \082 &H52 R | \134 &H86 å | \186 &HBA ║ | \238 &HEE ε |
| \031 &H1F ???| \083 &H53 S | \135 &H87 ç | \187 &HBB ╗ | \239 &HEF ∩ |
| \032 &H20 | \084 &H54 T | \136 &H88 ê | \188 &HBC ╝ | \240 &HF0 ≡ |
| \033 &H21 ! | \085 &H55 U | \137 &H89 ë | \189 &HBD ╜ | \241 &HF1 ± |
| \034 &H22 " | \086 &H56 V | \138 &H8A è | \190 &HBE ╛ | \242 &HF2 ≥ |
| \035 &H23 # | \087 &H57 W | \139 &H8B ï | \191 &HBF ┐ | \243 &HF3 ≤ |
| \036 &H24 $ | \088 &H58 X | \140 &H8C î | \192 &HC0 └ | \244 &HF4 ⌠ |
| \037 &H25 % | \089 &H59 Y | \141 &H8D ì | \193 &HC1 ┴ | \245 &HF5 ⌡ |
| \038 &H26 & | \090 &H5A Z | \142 &H8E Ä | \194 &HC2 ┬ | \246 &HF6 ÷ |
| \039 &H27 ' | \091 &H5B [ | \143 &H8F Å | \195 &HC3 ├ | \247 &HF7 ≈ |
| \040 &H28 ( | \092 &H5C \ | \144 &H90 É | \196 &HC4 ─ | \248 &HF8 ° |
| \041 &H29 ) | \093 &H5D ] | \145 &H91 æ | \197 &HC5 ┼ | \249 &HF9 ∙ |
| \042 &H2A * | \094 &H5E ^ | \146 &H92 Æ | \198 &HC6 ╞ | \250 &HFA · |
| \043 &H2B + | \095 &H5F _ | \147 &H93 ô | \199 &HC7 ╟ | \251 &HFB √ |
| \044 &H2C , | \096 &H60 ` | \148 &H94 ö | \200 &HC8 ╚ | \252 &HFC ⁿ |
| \045 &H2D - | \097 &H61 a | \149 &H95 ò | \201 &HC9 ╔ | \253 &HFD ² |
| \046 &H2E . | \098 &H62 b | \150 &H96 û | \202 &HCA ╩ | \254 &HFE ■ |
| \047 &H2F / | \099 &H63 c | \151 &H97 ù | \203 &HCB ╦ | \255 &HFF |
| \048 &H30 0 | \100 &H64 d | \152 &H98 ÿ | \204 &HCC ╠ | |
| \049 &H31 1 | \101 &H65 e | \153 &H99 Ö | \205 &HCD ═ | |
| \050 &H32 2 | \102 &H66 f | \154 &H9A Ü | \206 &HCE ╬ | |
| \051 &H33 3 | \103 &H67 g | \155 &H9B ¢ | \207 &HCF ╧ | |
+--------------+--------------+--------------+--------------+--------------+