home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
C!T ROM 5
/
ctrom5b.zip
/
ctrom5b
/
DOS
/
TEKST
/
ISAMF412
/
ISAMMAKE.DOC
< prev
next >
Wrap
Text File
|
1994-11-21
|
17KB
|
318 lines
ISAMMAKE.DOC
11/21/94
Program written by:
Bruce Guthrie
Room H-4885
U.S. Dept of Commerce/ESA/OBA/BSISD
Washington, D.C. 20230
(202) 482-3234
You may freely copy and re-distribute this program; however, the U.S.
Department of Commerce neither guarantees nor assures compatibility of the
program with all computer software or hardware.
Foreign users: Please provide an Internet e-mail address in all correspondence
or and just e-mail your problems to me at bgu@cu.nih.gov
The ISAMMAKE.EXE program builds an ISAM data base that includes every word found
in a particular set of files. This program is used in conjunction with the
ISAMFIND.EXE program which actually searches and displays the files.
Definition of "word": Currently, the program defines a "word" as consisting
only of letters of the alphabet. Non-letters are treated as word delimiters.
Words are a minimum of three characters in length (can be changed to be from
2 to 5) and a maximum of 10.
Since ISAMMAKE.EXE and ISAMFIND.EXE are related and share some of the same
options, there are some common features that are documented in the documentation
for one of the routines and not the other. In general, most of the shared
documentation ends up in ISAMFIND.DOC since that's all that people need to
search the documents. Shared documentation is as follows:
Features see ISAMFIND.DOC documentation
The ISAMFIND.INI file see ISAMFIND.DOC documentation
Format statements see ISAMFIND.DOC documentation
Quick demo see ISAMMAKE.DOC documentation
NOTE: You will find you typically need *both* a control file and an
initialization file (*.INI) to run this program. This is because the input and
output format statements (FI= and FO=) cannot be specified from the command line
and can only be specified in the initialization file.
Text files to be processed:
The program expects a control file to be passed in which tells it which files
are to be processed. That control file can be of any two types:
(1) /C=L: This is a basic file listing with (optional) descriptions. The file
directory is expected to consist entirely of file names, comment lines, and
description continuation lines.
The exact format for the file depends on any format statement found in your
initialization file. (See subsequent section a discussion of format
statements.) The default format for ISAMMAKE is "%fname% %fdesc%". For example:
ISAMMAKE.DOC Documentation for the ISAMMAKE program
C:\AUTOEXEC.BAT My automatic execution batch file
The file description has to be the last parameter in the format. You cannot
have a file description followed by, say, the file date or time. The file name
can include a wildcarded request. In the case of wildcards, the file
description can begin with one or more exclusion requests in the form
"/Xfilespec". For example:
ISAMMAKE.* /X*.EXE Process everything except the EXE files
\AUTO*.* /X*.EXE /X*.COM Some AUTOEXEC.BAT options
Comment lines are any lines beginning with any of the characters in the
/SKIP=string parameter. The default SKIP request is "/SKIP=;"; any line
beginning with a semi-colon in skipped.
Description continuation lines begin with a particular character string and are
used when the total file description doesn't fit conveniently on one line. In a
BBS package like TBBS, continuation lines begin with "!>". You can define the
continuation string using the /CONT=string specification. There is no
continuation specification by default.
(2) /C=F: This is similar to /C=L except It can consist of a collection of
directory file names. This is known as a "FAR file" in TBBS parlance and is
useful for collecting a variety of files together. Each of the directory files
themselves can be in /C=L format shown above.
Output files:
ISAMMAKE.EXE creates two output files. The first is an ISAM format file. The
second is a text list of data sets in that file.
People using VBDos/Professional and, in theory, any other language product that
supports ISAM files should be able to read the ISAM file. If you'd like the
data base structure for this file, contact Bruce Guthrie and I'll e-mail or fax
it to you. Note that a characteristic of an ISAM file is that the minimum file
size is 64K and that size increases are made in 32K increments.
The first seven characters of these file names are set with the /Fcorename
parameter and determined by the person who created the files using the ISAMMAKE
command.
xxxxxxx.ISA: Two ISAM data bases in one physical file. One tells the program
which files have been indexed and contains includes information about the size
of the file and how many words were in it. The second data base contains all
words above the minimum length cut-off. Note that the program resets the file
attribute of the ISAM file to be read-only. If you want to delete it, you have
to use the DOS ATTRIB command to reset it to read-write (ATTRIB xxxxxxx.ISA -R).
xxxxxxxF.LST: A text file which indicates which files have been indexed and is
a basic dump of the first ISAM file. Sample records are as follows:
1 C:\VBDOS\ISAMDEMO.001 94-11-05 20:59 50 10 Cat Story
2 C:\VBDOS\ISAMDEMO.002 94-11-05 01:15 24 4 Dog Story
3 C:\VBDOS\ISAMDEMO.003 94-11-05 01:16 56 10 House story
4 C:\VBDOS\ISAMFIND.DOC 94-11-12 18:26 14272 1273 Miscellaneous documentation
5 C:\VBDOS\ISAMMAKE.DOC 94-11-13 19:42 18902 1789 Miscellaneous documentation
The record lay-out is as follows:
cols 1- 4 document number (records sorted by document name)
6-57 full document name
59-66 document creation date (in yy-mm-dd format)
68-72 document creation time (in hh:mm format)
74-82 size of the file in bytes
84-88 number of words in the document
90-97 file area (/AREA=whatever or from /C=F information)
99-on document name
The ISAMFIND.INI file:
See ISAMFIND.DOC documentation.
Format statements:
See ISAMFIND.DOC documentation.
Quick demo:
Okay. So you've got this program in your hot little hands and you want to see
what it can do for you. Easy enough. There's a batch file ISAMDEMO.BAT that
will build a data base for you and search it for something in it. This demo
presumes you have Vern Buerg's excellent LIST program somewhere in your path.
(By default, ISAMFIND presumes you use my own totally free READ program to
view text files but, it's more realistic to presume you have LIST instead.) If
you don't have LIST, you should edit the ISAMDEMO.BAT file and replace the
"/VLIST" parameter with the name of the file viewer you have.
Run the batch file and, when prompted, key in "house" as the word you want to
see. The program will display a list of file names which contain the word
"house", listing the "best" documents first.
Syntax:
ISAMMAKE /Fcorename /Cctlfile [ /C=L | /C=F ] [ /2 | /3 | /4 | /5 ]
[ /ACCEPT=string ] [ /AREA=string ] [ /CONT=string ] [ /STOP=string ]
[ /OVERWRITE | /-OVERWRITE | /APPEND | /UPDATE ] [ /SKIP=string ] [ /Td: ]
[ /Wn ] [ /PACK | /-PACK ] [ /Iinitfile | /-I ] [ /? | /?&H ]
where:
"/Fcorename" is the filename that will be used as the basis for all ISAM files
that are created. The core name should include a drive and path and up to seven
characters of a file name without an extension.
"/Cctlfile" is the control card file. See description above.
"/C=L" and "/C=F" specifies the function of the control card file. See the
discussion above.
"/2", "/3", "/4", and "/5" change the minimum word length. Defaults to /3.
Note that you should specify the same minimum word length when you invoke the
ISAMFIND program too.
"/ACCEPT=string" allows you to specify characters *other than A to Z* that
should be accepted as parts of words. Foreign users, for example, might want to
include some foreign characters. The string can include hexadecimal codes.
"/AREA=string" specifies the area stamp that is to appear in the ultimate file.
This can be set for the entire request. If you're using /C=F and no /AREA is
specified, the program will take the directory name itself to be the AREA
description.
"/CONT=string" specifies that directory line descriptions can continue to
multiple lines which begin with the given string. In TBBS, "/CONT=!>" would be
appropriate. By default, no continuations are expected. The string can
include hexadecimal codes (which is especially necessary if you're using TBBS
directory structures which require "/CONT=!>"--do this as "/CONT=!\062" instead.
"/STOP=string" specifies that the description is to stop with a given string.
This is useful if the description includes some filler information at the end.
The string can include hexadecimal codes.
"/OVERWRITE" specifies that the output ISAM files are to be ignored and replaced
if they exist already.
"/-OVERWRITE" specifies that the program should abort if the output ISAM files
already exist.
"/APPEND" specifies that the program is to add any files it can find but it
doesn't have to worry about handling unaccounted for files. This is used if
you're adding some directories that might already be in the data base.
"/UPDATE" specifies that the program is to add any files it can find. If
there's a pre-existing file in the database that isn't updated, the program will
delete that file and all of its words.
"/SKIP=string" indicates that any DIR or FAR lines that begin with any of the
characters in "string" should be skipped as being comments. You can use
hexadecimal codes if you need to. You *must* use hexadecimal codes if you want
to skip lines beginning with a space. Blank lines are always ignored. Defaults
to "/SKIP=;".
"/Td:" specifies the drive to write any temporary ISAM files that the routine
needs. ISAM data bases are used to store and sort the file names and directory
totals. ISAM files cannot be created reliably on certain types of drives. If a
/Td: specification (e.g. "/TC:") is not specified, the routine checks each of
the following drive specifications in order:
- the drive where the ISAM file is being written to (corename)
- the default drive
- drive C
In each case, the program tries to skip the drive if it's either removeable or
a remote (network) drive. The latter test is often incorrect. After that, it
tries to create a file on the drive; CD-ROM drives always fail that test.
"/Wn" specifies the additional weighting that words in the title are to
receive. Title words don't count in the document count. Initially defaults
to "/W3".
"/PACK" compresses the ISAM file after creating this. This is sometimes
necessary because deleted ISAM records are tagged as "deleted" without actually
being removed from the data base. This becomes an issue if you use /UPDATE a
lot. Remember that ISAM files are created with an initial size allocation of
64K and expanded in 32K chunks after that so packing won't necessarily mean an
actual reduction in the size of the file.
"/-PACK" skips the ISAM compression step. This is initially the default.
"/Iinitfile" says to read an initialization file with the file name "initfile".
The file specification *must* contain a period. If no drive or path information
is specified, the program will search for initfile beginning in your default
subdirectory and then going throughout your DOS path. The use of an
initialization file is optional. Initially defaults to "/IISAMFIND.INI".
"/-I" (or "/INULL") says to skip loading the initialization file.
"/?" or "/HELP" or "HELP" gives you the syntax of the command.
"/?&H" gives you a hexadecimal and decimal conversion table.
Decimal and hexadecimal codes:
e.g. "\066\097\116" and "&H426174" both are "Bat"
+---------------------------------------------------------------------------
| dec hex chr | dec hex chr | dec hex chr | dec hex chr | dec hex chr |
+--------------+--------------+--------------+--------------+--------------+
| \000 &H00 nul| \052 &H34 4 | \104 &H68 h | \156 &H9C £ | \208 &HD0 ╨ |
| \001 &H01 | \053 &H35 5 | \105 &H69 i | \157 &H9D ¥ | \209 &HD1 ╤ |
| \002 &H02 | \054 &H36 6 | \106 &H6A j | \158 &H9E ₧ | \210 &HD2 ╥ |
| \003 &H03 | \055 &H37 7 | \107 &H6B k | \159 &H9F ƒ | \211 &HD3 ╙ |
| \004 &H04 | \056 &H38 8 | \108 &H6C l | \160 &HA0 á | \212 &HD4 ╘ |
| \005 &H05 | \057 &H39 9 | \109 &H6D m | \161 &HA1 í | \213 &HD5 ╒ |
| \006 &H06 | \058 &H3A : | \110 &H6E n | \162 &HA2 ó | \214 &HD6 ╓ |
| \007 &H07 bel| \059 &H3B ; | \111 &H6F o | \163 &HA3 ú | \215 &HD7 ╫ |
| \008 &H08 bs | \060 &H3C < | \112 &H70 p | \164 &HA4 ñ | \216 &HD8 ╪ |
| \009 &H09 tab| \061 &H3D = | \113 &H71 q | \165 &HA5 Ñ | \217 &HD9 ┘ |
| \010 &H0A lf | \062 &H3E > | \114 &H72 r | \166 &HA6 ª | \218 &HDA ┌ |
| \011 &H0B vt | \063 &H3F ? | \115 &H73 s | \167 &HA7 º | \219 &HDB █ |
| \012 &H0C pg | \064 &H40 @ | \116 &H74 t | \168 &HA8 ¿ | \220 &HDC ▄ |
| \013 &H0D cr | \065 &H41 A | \117 &H75 u | \169 &HA9 ⌐ | \221 &HDD ▌ |
| \014 &H0E | \066 &H42 B | \118 &H76 v | \170 &HAA ¬ | \222 &HDE ▐ |
| \015 &H0F | \067 &H43 C | \119 &H77 w | \171 &HAB ½ | \223 &HDF ▀ |
| \016 &H10 | \068 &H44 D | \120 &H78 x | \172 &HAC ¼ | \224 &HE0 α |
| \017 &H11 | \069 &H45 E | \121 &H79 y | \173 &HAD ¡ | \225 &HE1 ß |
| \018 &H12 | \070 &H46 F | \122 &H7A z | \174 &HAE « | \226 &HE2 Γ |
| \019 &H13 | \071 &H47 G | \123 &H7B { | \175 &HAF » | \227 &HE3 π |
| \020 &H14 | \072 &H48 H | \124 &H7C | | \176 &HB0 ░ | \228 &HE4 Σ |
| \021 &H15 | \073 &H49 I | \125 &H7D } | \177 &HB1 ▒ | \229 &HE5 σ |
| \022 &H16 | \074 &H4A J | \126 &H7E ~ | \178 &HB2 ▓ | \230 &HE6 µ |
| \023 &H17 | \075 &H4B K | \127 &H7F | \179 &HB3 │ | \231 &HE7 τ |
| \024 &H18 | \076 &H4C L | \128 &H80 Ç | \180 &HB4 ┤ | \232 &HE8 Φ |
| \025 &H19 | \077 &H4D M | \129 &H81 ü | \181 &HB5 ╡ | \233 &HE9 Θ |
| \026 &H1A eof| \078 &H4E N | \130 &H82 é | \182 &HB6 ╢ | \234 &HEA Ω |
| \027 &H1B esc| \079 &H4F O | \131 &H83 â | \183 &HB7 ╖ | \235 &HEB δ |
| \028 &H1C | \080 &H50 P | \132 &H84 ä | \184 &HB8 ╕ | \236 &HEC ∞ |
| \029 &H1D ???| \081 &H51 Q | \133 &H85 à | \185 &HB9 ╣ | \237 &HED φ |
| \030 &H1E ???| \082 &H52 R | \134 &H86 å | \186 &HBA ║ | \238 &HEE ε |
| \031 &H1F ???| \083 &H53 S | \135 &H87 ç | \187 &HBB ╗ | \239 &HEF ∩ |
| \032 &H20 | \084 &H54 T | \136 &H88 ê | \188 &HBC ╝ | \240 &HF0 ≡ |
| \033 &H21 ! | \085 &H55 U | \137 &H89 ë | \189 &HBD ╜ | \241 &HF1 ± |
| \034 &H22 " | \086 &H56 V | \138 &H8A è | \190 &HBE ╛ | \242 &HF2 ≥ |
| \035 &H23 # | \087 &H57 W | \139 &H8B ï | \191 &HBF ┐ | \243 &HF3 ≤ |
| \036 &H24 $ | \088 &H58 X | \140 &H8C î | \192 &HC0 └ | \244 &HF4 ⌠ |
| \037 &H25 % | \089 &H59 Y | \141 &H8D ì | \193 &HC1 ┴ | \245 &HF5 ⌡ |
| \038 &H26 & | \090 &H5A Z | \142 &H8E Ä | \194 &HC2 ┬ | \246 &HF6 ÷ |
| \039 &H27 ' | \091 &H5B [ | \143 &H8F Å | \195 &HC3 ├ | \247 &HF7 ≈ |
| \040 &H28 ( | \092 &H5C \ | \144 &H90 É | \196 &HC4 ─ | \248 &HF8 ° |
| \041 &H29 ) | \093 &H5D ] | \145 &H91 æ | \197 &HC5 ┼ | \249 &HF9 ∙ |
| \042 &H2A * | \094 &H5E ^ | \146 &H92 Æ | \198 &HC6 ╞ | \250 &HFA · |
| \043 &H2B + | \095 &H5F _ | \147 &H93 ô | \199 &HC7 ╟ | \251 &HFB √ |
| \044 &H2C , | \096 &H60 ` | \148 &H94 ö | \200 &HC8 ╚ | \252 &HFC ⁿ |
| \045 &H2D - | \097 &H61 a | \149 &H95 ò | \201 &HC9 ╔ | \253 &HFD ² |
| \046 &H2E . | \098 &H62 b | \150 &H96 û | \202 &HCA ╩ | \254 &HFE ■ |
| \047 &H2F / | \099 &H63 c | \151 &H97 ù | \203 &HCB ╦ | \255 &HFF |
| \048 &H30 0 | \100 &H64 d | \152 &H98 ÿ | \204 &HCC ╠ | |
| \049 &H31 1 | \101 &H65 e | \153 &H99 Ö | \205 &HCD ═ | |
| \050 &H32 2 | \102 &H66 f | \154 &H9A Ü | \206 &HCE ╬ | |
| \051 &H33 3 | \103 &H67 g | \155 &H9B ¢ | \207 &HCF ╧ | |
+--------------+--------------+--------------+--------------+--------------+