Aminet 10

home *** CD-ROM | disk | FTP | other *** search

/ Aminet 10 / aminetcdnumber101996.iso / Aminet / util / misc / ftype.lha / ftype.doc < prev

Wrap

Text File | 1995-12-30 | 5KB | 128 lines

This program can be distributed with the following conditions: 1) This documentation is included and unaltered. 2) One may not charge more for it than the price of the media being used in its distribution. This means that one can "sell" a disk containing this program for no more than the price of a blank disk. 3) Special case - inclusion in Aminet CD collections is allowed. /*============================================================================*/ Name: ftype Version: 0.9 Function: This program determines a file's type based on past observances of similar files. Reason: Before ftype, it was necessary to update file recognizers as new file types came into usage. One had to go find a "brain file" or specify information from detailed personal knowledge of the new file type. Recognizers in the past often (almost always) relied on a magic tag at near the start of a file to identify a file. Ftype learns how to recognize a new file without requiring you to research the file format or go hunting for a brain file. It takes advantage of magic numbers but it doesn't rely on them. This means that you can use ftype to diferentiate between C source files and C++ source files if you want to. Of course, if two files are extremely similar, ftype will have difficulty differentiating between them. Disclaimer: This is betaware and I don't have any Amigas other than my trusty A3000 to test test the system with so if it makes your machine explode, tell me and I'll fix the program. My results: I train the system with two instances of each of the following file types: ILBM C source JPEG LHA song lyrics SAS/C object files PostScript When tested on five instances of each type, the system has a 100% recognition rate. Usage: ftype <options> <filename> <datafile> -h: help -i: implicit training -t <name>: train -p: print network -v: verbose -d <name>: specify the data file to use Implicit training will cause the network to refine itself whenever a file is strongly recognized. It is necessary to train the system before any files will be recognized. If you want to be able to detect JPEG and C source files, find a few of them and do this: ftype -t c <c source file A> ftype -t c <c source file B> ftype -t jpeg <JPEG file A> ftype -t jpeg <JPEG file A> Train the system with at least three or four files for good results. The -p and -v options are used for debugging. They make the system print out a bunch of stuff about its internal state. Normally the system uses the file libs:ftype.dat to store its past training. If you want to start training from scratch, just delete this file. You can manually specify the data file to use with the -d option. To ask the program what category a file fits into (a LHA file called sobja.lha, for instance) type this: ftype sobja.lha The program will print out something like this: Strongly recognized the file "test/lha/sobja.lha" as the type "lha". Distance: 0.010774 The "Strongly" part means that the neural network was able to help in file recognition. Any file that is "Weakly" recognized is different enough from the average file of its type that the neural network was unable to aid in recognition. If toy start seeing files be weakly recognized, you might want to further train the program. The "Distance" is the error in the identification compared to the average error for all neurons. A large distance (what large is depends on the file types you're working with) means that the identification is uncertain. Behavior: The system uses a non-linear neural network to learn about the significance of a file type's different features. You may notice that the system is slow when dealing with large files. This is not a result of accessing the network, it is due to the somewhat paranoid (overly conservative) way my system does feature extraction. Future Work: If anyone makes (polite) suggestions regarding improvements to the system, I will try to implement them. I may work on speeding up the feature extraction part of the program. Right now, it's fast enough for my A3000 with a fast hard drive but I don't think it would be much fun on an A500. Tested on: Amiga 3000-25 4 megs fast, 2 megs chip Seagate ST32550N drive OS 3.1 Write Me: If you have suggestions regarding improvements you would like to see or bug reports, write me. -Robert Dick (dickrp@wckn.dorm.clarkson.edu)-