home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The World of Computer Software
/
World_Of_Computer_Software-02-386-Vol-2of3.iso
/
f
/
filtyp11.zip
/
FILETYPE.DOC
next >
Wrap
Text File
|
1991-08-19
|
9KB
|
180 lines
F I L E T Y P E 1 . 1
======================
Free Software by TapirSoft Gisbert W.Selke
August 1991
This is a utility similar to the Un*x programme named file. It takes
the name of a file of unknown purpose and tries to guess what kind of a
file it is -- a ZIP archive, an LZH archive, an executable, an MS Word
document, a QuattroPro spreadsheet, a Bitstream font, or whatnot. Often,
the purpose of a file can be gleaned from its extension; but sometimes
(e.g., after transmission by E-mail), this extension is lost, and
sometimes it just isn't meaningful. (After all, there are only finitely
many permissible file extensions, but probably uncountably many purposes
to use files for.)
FileType does its work by looking at sepcified bytes in the specified
file; it tries to match these against a number of known file signatures.
These signatures are stored in a plain ASCII text file; this signature
file can be extended at will and as need dictates.
Naturally, these guesses are not always correct.
The simplest way to use FileType is
filetype <filename>
where <filename> is replaced by the name of the file to be examined.
(No wildcards allowed.) Output consists of a header plus a single line.
E.g., if you type
filetype filetype.exe
you'll get this answer:
filetype.exe: executable (EXE)
In order to be able to work, FileType must have access to a file
containing the magic signatures; ordinarily, this file is called
MAGIC.FT and is located in the current directory or in the directory
where FILETYPE.EXE itself is stored. (The latter method works only
under MS-DOS 3.x or later.) You can specify a different magic file, or
an explicit path, with the /m switch:
filetype /mc:\stuff\mymagic.typ foo.bar
(Notice no blanks between /m and the file name!).
There's another command line switch, /q, which suppresses output of the
header, by the way; and just typing 'filetype' without any arguments
displays usage hints.
If you want to check a whole bunch of files, use something like this:
for %f in (*.*) do filetype %f
(If you used this inside a batch file, you wouldn't forget to double the
percent characters, would you?)
One thing remains to be told: how to extend the magic file? Just take an
editor that stores plain ASCII files (no extraneous word processor
information!) and add, modify, or delete lines at your leisure.
These are the rules: (Advanced topics are marked with an asterisk.)
- Maximum line length is 255 characters. Lines are CRLF-delimited.
File must be plain (8 bit) ASCII.
- Each line consists of a file recognizer pattern, then at least one
blank, then either a name for the file type thusly identified or
a continuation marker.
- The recognizer sequence consists of a file offset (optional), a bit
mask (optional), and a matching sequence (required). These items, if
present, must be separated by at least one blank each; the ordering
of these items is required.
* The file offset starts with @, then an optional -, followed by
a seqence of hex (!) digits which represent an offset into the file
at which the matching should occur. (No blanks within the file
offset sequence!) Start of file is at 0(!). A negative offset
matches from the end of the file, with the last byte in the file
pointed to by -1. -- Default for offset is 0.
* The bit mask starts with & and is followed by a sequence as
specified for matching sequences (cf. below). This bit mask will be
ANDed bytewise to the bytes found in the file before matching takes
place. Thus, masking with DF would make matching of 7-bit ASCII
characters case-independent. (However, note the use of double quotes
below.) If the bit mask is shorter than the matching sequence, it is
extended with FF (functionally equivalent to no masking at all.) --
Default for the bit mask is all FFs.
- The matching sequence can be any mixture of pairs of hex digits and
ASCII strings enclosed in single (') or double (") quotes.
- Characters in single quotes require an exact match, characters in double
quotes are matched case-independently. (Cf. note on case-conversion below.)
- Both subtypes may contain a question mark to stand for any
character. (And I mean 'character', *not* hex digit!)
- ASCII strings may contain escaped sequences: \' and \" for
embedded quotes, \b (backspace), \t (tab), \n (newline), \v
(vertical tab), \f (form feed), \r (carriage return), \? (question
mark), \\ (backslash).
- If a starting sequence in this file is identical to the beginning of
another one, the longer sequence should come first.
- Comment lines may start with semicolon or hash mark.
* For case-independent matchings, FileType knows about the upper-case
equivalents of standard 7-bit ASCII characters; under DOS 3.30+, it
can also handle (8-bit) national characters according to your
country code and code page. You can override this knowledge by
including a pair of lines starting with v and ^, respectively. These
lines must not contain any blanks after the line marker and must
match character by character. The 'v' line contains lower-case
characters, the '^' line the corresponding upper-case characters.
You need specify only as many characters as are necessary. These
lines must occur in Magic.FT before the first recognizer line in
which they are needed. -- Note that two or more pairs of translator
lines may be specified, but only the last one used will be in
effect.
* If different places of the file need to be checked for pattern
matching, there are two ways to do so:
- If the places are close together, specify *one* sequence and a bit
mask to ignore the irrelevant bytes by ANDing these with 0.
- Otherwise, use a multi-line matching: specify one recognizer
pattern, but instead of a file type name, include a slash (/);
then, on a new line, specify the next recognizer pattern, this
time using the file type name. (There may be more than one slash-
delimited line.) This way, all the slash-delimited lines *and*
the next one are required to match.
- The first line of the file is taboo.
Note that a file offset will rarely have to be used, since most files
can be told from their first few bytes (if at all). You may consider
offsets and masking as advanced topics which are necessary only in very
special circumstances. Multi-line matchings will have to be used even
more rarely. -- In any case, remember that you are invited to extend or
change the magic file to suit your ain needs.
That's it. Enjoy. And if you feel I have omitted a really important sort
of files from MAGIC.FT (as distributed) and you know its magic
signature, why not send it to me? I can be reached at
TapirSoft
Gisbert W.Selke
Ermekeilstrasse 28
D-5300 Bonn 1
Germany
E-Mail: <s00100@dbnrhrz1.bitnet>
History:
1.0 01 Aug 1991 It hit the world.
1.1 19 Aug 1991 AARGH. Wildcard handling was broken, discovered
by Richard J. Reiner. Fixed. Added multi-line
matching. Added automatic national character
uppercasing via country code and code page.
Increased I/O buffer sizes. Corrected doc bug.
Commented source some (gasp).
Oh, the legal stuff:
FileType.Pas contains no material copyrighted by anyone else. I retain
the copyright on FileType; however, there are no restrictions on using
and copying the package, as long as no money is asked for and all files
are distributed unaltered and together. (That is: FILETYPE.PAS,
FILETYPE.EXE, MAGIC.FT, FILETYPE.DOC, suitably archived by your
favourite archiver.) The usual standard disclaimers apply: I cannot be
held responsible for this programme's doing anything at all, or nothing
at all, or not doing what you'd like it to do. (Which, on the other
hand, isn't meant to say that I'd ignore any sufficiently detailed bug
report.)
Registered trademarks etc. used in this file:
Bitstream : Bitstream Inc.
MS DOS, MicroSoft Word: Microsoft Corporation
QuattroPro : Borland International
ZIP : PKWare, Inc.