The Unsorted BBS Collection

home *** CD-ROM | disk | FTP | other *** search

/ The Unsorted BBS Collection / thegreatunsorted.tar / thegreatunsorted / misc / tbweeder.doc < prev next >

Wrap

Text File | 1993-11-26 | 5KB | 129 lines

What is TbWeeder ---------------- TbWeeder is a utility to weed out duplicate files. Virus researchers often receive large virus collections which contain many duplicate files. Not all anti-virus vendors use the same virus naming convention, and often a virus sample is renamed to match to the name printed by the scanner used to identify the virus. These renamed files are copied into other collections, causing many renamed but equal files floating around in all kind of virus collections. TbWeeder can help to identify duplicate files, and automatically delete them. Duplicate files are files with the same 32-bit CRC and length. To be absolutely sure, TbWeeder will perform a full match - byte by byte - of the files if both files are available. TbWeeder can also maintain a database so it is not necessary to rescan all files over and over again to search for duplicates. Interesting features -------------------- - The amount of files TbWeeder can handle is 65534 - TbWeeder can optionally delete duplicate files - TbWeeder can be used to compare and weed files from one path against another path, but also to compare and weed within a single path. - TbWeeder accepts filename specifications, so it can be used to check just one file against a huge collection. - TbWeeder can maintain two databases, one for the CRC and length information, another one for the names of the files in the database. To weed out remotely, the relatively small CRC database is sufficient. - TbWeeder is able to compare files byte for byte for additional security. - TbWeeder is able to output a report file with all duplicate files. - TbWeeder is fast (due to a 128Kb hash table and nifty linked lists!). Intended purpose ---------------- Example 1: Suppose you have a virus collection in directory C:\MYVIRS with viruses sorted out. In directory C:\NEWVIRUS you receive new virus samples. Enter: TbWeeder c:\MyVirs /add This causes TbWeeder to generate a database with file information. To find out which viruses in directory C:\NEWVIRS are duplicates, execute: TbWeeder c:\NewVirs You can optionally put all duplicate files in a log file by using option /log or automatically delete the duplicates by using option /del. Example 2: Suppose you have a directory VIRUSES and you want to delete all duplicates. Enter: TbWeeder Viruses /add /del This causes TbWeeder to build a database and delete duplicate files at the same time! Example 3: Suppose you want to know whether viruses from someone else's collection are the same ones you have. Run TbWeeder on your own collection with option /noname, and distribute TbWeeder and the TbWeeder.Dat file to the other collection. TbWeeder can now be used to create a log file of all known viles. The database ------------ TbWeeder can only be used with an external database, due to the excessive amount of data it has to handle when comparing a file against 65000 others! TbWeeder.Dat will contain the 32-bit CRC and length of all files. This information is usually sufficient to find out whether a file is a duplicate or not. To become completely sure, TbWeeder can also perform a byte for byte comparison after it thinks that two files are identical. However, in this case TbWeeder needs the name of the original file and of course the original file itself. Therefore TbWeeder will also maintain a name reference, named TbWeeder.Lst. This file can become quite large, several megabytes is not unusual. If you don't want these extended features, you can save disk space by specifying option /noname. Since TbWeeder.Lst will become very large and will only be necessary to list the name of the first - original - file and to perform a byte by byte match, you may choose not to distribute this file to others. It can however be usefull to distribute the other file, TbWeeder.Dat, to others, to weed out file remotely (to avoid that people send you files you already have). The maximum size of TbWeeder.Dat is 512Kb (with over 65000 files!). Usage ----- Usage: TbWeeder [<path>][<filename>] [<options>...] If no options are specified, the specified path will be scanned for duplicate files. TbWeeder will compare these files against the files in the TbWeeder.Dat database, and against the files in the specified path itself. -> IF THERE IS NOT ALREADY A DATABASE YOU NEED TO SPECIFY OPTION /ADD Command line options (abbreviations between brackets). help (h) displays a help file. nosub (s) do not process sub directories. add (a) The files which have been found to be unique will be stored in the database files. del (d) delete duplicate files. noname (n) do not create or consult the large name reference database. This will disable the full byte by byte comparison as well. log (l) log duplicate files