MACD 5

home *** CD-ROM | disk | FTP | other *** search

/ MACD 5 / MACD5.iso / workbench / tools / czesc_4 / tclass / docs / ai.doc next >

Wrap

Text File | 1992-12-18 | 3.3 KB | 91 lines

DOCUMENT: AI This is the artificial intelligence document for TClass. Other documents in this directory: TClass.doc Programmer.doc Installation.doc showbytes.doc Contents 1.A - AI used in TClass? 1.B - How does it work? ================================================================================ 1.A - AI used in TClass? Yes, a minor form of AI. When I started thinking about how I was going to go about the 'learn' option in TClass, I thought I'd go by a simple algorithm that humans might use to compare file types. 1.B - How does it work? It works like this: Fig1.1 TClass learn lc.lib lcm.lib lcs.lib lcmffp.lib Ok, lc.lib is referred to as the "base file" because that's the first file that's payed attention to. What TClass does is, it loads in lc.lib, reads the first twenty bytes, and stores them in some array. Let's say the bytes are: 5 0 3 9 -13 7 92 25 -5 1 20 7 11 92 0 0 0 -2 0 -2 Ok, so we have some sort of basis here. These bytes are stored in a stable array, one that isn't going to be overwritten--yet. Next, lcm.lib is opened, the first 20 bytes are read into a temporary array. lc.lib: 5 0 3 9 -13 7 92 25 -5 1 20 7 11 92 0 0 0 -2 0 -2 lcm.lib: 5 0 3 9 -13 2 92 25 -5 1 20 7 11 92 0 0 3 -2 0 -2 ^ ^ As you see, lcm.lib has two bytes that conflict with the original's. When a human sees this, they may think "Ok, those bytes obviously have nothing to do with identifying the filetype, so I'll ignore those in the future since I KNOW lc.lib and lcm.lib are of the same type." And this is exactly what TClass does. It stamps an "ignore" byte in that position. Now lc.lib, the base file, looks like this: lc.lib: 5 0 3 9 -13 i 92 25 -5 1 20 7 11 92 0 0 i -2 0 -2 And the next file is read, and the next, and the next, until it's done, and at the end, lc.lib might look like: lc.lib: 5 i 3 9 -13 i 92 25 -5 i 20 i i i 0 0 i -2 0 -2 So we have 7 ignored bytes out of a possible 20. That's 35% ignored, or 65% KNOWN bytes. Thus, Filetype learned! 65% accuracy. So whether you give TClass 2 files to learn, or 100, accuracy percentages can vary. So, let's say you goof up: TClass learn ignore fido.font eclipse.font newfonts/#?.font oops.medmod Let's say after all the fonts are done, we have this: 9 1 -20 0 i i 0 -5 22 71 30 -5 2 7 91 i -2 0 0 1 Well, next file is oops.medmod, a MED module. Well, that has a TOTALLY different file structure. So let's say that the MED mod might have a COUPLE bytes that resemble a font: fonts: 9 1 -20 0 i i 0 -5 22 71 30 -5 2 7 91 i -2 0 0 1 oops.medmod: 5 1 32 2 45 3 -1 9 82 -2 34 97 5 0 0 0 0 6 0 2 Well, after this comparison, we have this as the bytelist: i 1 i i i i i i i i i i i i i i i i 0 i Not very productive. :) That's 18/20 ignores, 90% ignored bytes, or 10% KNOWN bytes. :) So when this faulty thing called a "font" is recorded into TClass.brain, you could do "TClass i_goofed.library" and it might be called a font, just because it has a 1 in position 2 and a 0 in position 19. :) So watch the accuracy reponse at the end of a learning session. Or, even more effective, you could view S:TClass.brain and check out the number of i's. See Programmer.doc for the format of the brain file.