home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Aminet 10
/
aminetcdnumber101996.iso
/
Aminet
/
util
/
misc
/
ftype.lha
/
ftype.doc
< prev
Wrap
Text File
|
1995-12-30
|
5KB
|
128 lines
This program can be distributed with the following conditions:
1) This documentation is included and unaltered.
2) One may not charge more for it than the price of the
media being used in its distribution. This means that
one can "sell" a disk containing this program for no more
than the price of a blank disk.
3) Special case - inclusion in Aminet CD collections is
allowed.
/*============================================================================*/
Name:
ftype
Version:
0.9
Function:
This program determines a file's type based on past observances of
similar files.
Reason:
Before ftype, it was necessary to update file recognizers as new file
types came into usage. One had to go find a "brain file" or specify
information from detailed personal knowledge of the new file type.
Recognizers in the past often (almost always) relied on a magic tag
at near the start of a file to identify a file.
Ftype learns how to recognize a new file without requiring you to
research the file format or go hunting for a brain file. It takes
advantage of magic numbers but it doesn't rely on them. This means
that you can use ftype to diferentiate between C source files and C++
source files if you want to. Of course, if two files are extremely
similar, ftype will have difficulty differentiating between them.
Disclaimer:
This is betaware and I don't have any Amigas other than my trusty A3000
to test test the system with so if it makes your machine explode, tell me
and I'll fix the program.
My results:
I train the system with two instances of each of the following file
types:
ILBM
C source
JPEG
LHA
song lyrics
SAS/C object files
PostScript
When tested on five instances of each type, the system has a 100%
recognition rate.
Usage:
ftype <options> <filename> <datafile>
-h: help
-i: implicit training
-t <name>: train
-p: print network
-v: verbose
-d <name>: specify the data file to use
Implicit training will cause the network to refine itself whenever a
file is strongly recognized.
It is necessary to train the system before any files will be
recognized. If you want to be able to detect JPEG and C source
files, find a few of them and do this:
ftype -t c <c source file A>
ftype -t c <c source file B>
ftype -t jpeg <JPEG file A>
ftype -t jpeg <JPEG file A>
Train the system with at least three or four files for good results.
The -p and -v options are used for debugging. They make the system
print out a bunch of stuff about its internal state.
Normally the system uses the file libs:ftype.dat to store its past
training. If you want to start training from scratch, just delete this
file. You can manually specify the data file to use with the -d
option.
To ask the program what category a file fits into (a LHA file called
sobja.lha, for instance) type this:
ftype sobja.lha
The program will print out something like this:
Strongly recognized the file "test/lha/sobja.lha" as the type "lha".
Distance: 0.010774
The "Strongly" part means that the neural network was able to help in
file recognition. Any file that is "Weakly" recognized is different
enough from the average file of its type that the neural network was
unable to aid in recognition. If toy start seeing files be weakly
recognized, you might want to further train the program.
The "Distance" is the error in the identification compared to the
average error for all neurons. A large distance (what large is depends
on the file types you're working with) means that the identification is
uncertain.
Behavior:
The system uses a non-linear neural network to learn about the
significance of a file type's different features. You may notice that
the system is slow when dealing with large files. This is not a result
of accessing the network, it is due to the somewhat paranoid (overly
conservative) way my system does feature extraction.
Future Work:
If anyone makes (polite) suggestions regarding improvements to the
system, I will try to implement them. I may work on speeding up the
feature extraction part of the program. Right now, it's fast enough for
my A3000 with a fast hard drive but I don't think it would be much fun
on an A500.
Tested on:
Amiga 3000-25
4 megs fast, 2 megs chip
Seagate ST32550N drive
OS 3.1
Write Me:
If you have suggestions regarding improvements you would like to see or
bug reports, write me.
-Robert Dick (dickrp@wckn.dorm.clarkson.edu)-