home *** CD-ROM | disk | FTP | other *** search
- DOCUMENT: AI
-
- This is the artificial intelligence document for TClass. Other documents in
- this directory:
-
- TClass.doc
- Programmer.doc
- Installation.doc
- showbytes.doc
-
- Contents
-
- 1.A - AI used in TClass?
- 1.B - How does it work?
-
- ================================================================================
-
- 1.A - AI used in TClass?
-
- Yes, a minor form of AI. When I started thinking about how I was going to go
- about the 'learn' option in TClass, I thought I'd go by a simple algorithm
- that humans might use to compare file types.
-
- 1.B - How does it work?
-
- It works like this:
-
- Fig1.1 TClass learn lc.lib lcm.lib lcs.lib lcmffp.lib
-
- Ok, lc.lib is referred to as the "base file" because that's the first file
- that's payed attention to. What TClass does is, it loads in lc.lib, reads
- the first twenty bytes, and stores them in some array. Let's say the bytes
- are:
-
- 5 0 3 9 -13 7 92 25 -5 1 20 7 11 92 0 0 0 -2 0 -2
-
- Ok, so we have some sort of basis here. These bytes are stored in a stable
- array, one that isn't going to be overwritten--yet. Next, lcm.lib is opened,
- the first 20 bytes are read into a temporary array.
-
- lc.lib: 5 0 3 9 -13 7 92 25 -5 1 20 7 11 92 0 0 0 -2 0 -2
- lcm.lib: 5 0 3 9 -13 2 92 25 -5 1 20 7 11 92 0 0 3 -2 0 -2
- ^ ^
- As you see, lcm.lib has two bytes that conflict with the original's. When a
- human sees this, they may think "Ok, those bytes obviously have nothing to do
- with identifying the filetype, so I'll ignore those in the future since I
- KNOW lc.lib and lcm.lib are of the same type." And this is exactly what
- TClass does. It stamps an "ignore" byte in that position. Now lc.lib, the
- base file, looks like this:
-
- lc.lib: 5 0 3 9 -13 i 92 25 -5 1 20 7 11 92 0 0 i -2 0 -2
-
- And the next file is read, and the next, and the next, until it's done, and
- at the end, lc.lib might look like:
-
- lc.lib: 5 i 3 9 -13 i 92 25 -5 i 20 i i i 0 0 i -2 0 -2
-
- So we have 7 ignored bytes out of a possible 20. That's 35% ignored, or 65%
- KNOWN bytes. Thus,
-
- Filetype learned! 65% accuracy.
-
- So whether you give TClass 2 files to learn, or 100, accuracy percentages can
- vary. So, let's say you goof up:
-
- TClass learn ignore fido.font eclipse.font newfonts/#?.font oops.medmod
-
- Let's say after all the fonts are done, we have this:
-
- 9 1 -20 0 i i 0 -5 22 71 30 -5 2 7 91 i -2 0 0 1
-
- Well, next file is oops.medmod, a MED module. Well, that has a TOTALLY
- different file structure. So let's say that the MED mod might have a COUPLE
- bytes that resemble a font:
-
- fonts: 9 1 -20 0 i i 0 -5 22 71 30 -5 2 7 91 i -2 0 0 1
- oops.medmod: 5 1 32 2 45 3 -1 9 82 -2 34 97 5 0 0 0 0 6 0 2
-
- Well, after this comparison, we have this as the bytelist:
-
- i 1 i i i i i i i i i i i i i i i i 0 i
-
- Not very productive. :) That's 18/20 ignores, 90% ignored bytes, or 10%
- KNOWN bytes. :) So when this faulty thing called a "font" is recorded into
- TClass.brain, you could do "TClass i_goofed.library" and it might be called a
- font, just because it has a 1 in position 2 and a 0 in position 19. :) So
- watch the accuracy reponse at the end of a learning session. Or, even more
- effective, you could view S:TClass.brain and check out the number of i's.
- See Programmer.doc for the format of the brain file.
-
-