home *** CD-ROM | disk | FTP | other *** search
-
-
- What is TbWeeder,
- Revised By Falcon 3/15/94. Read the entire document before proceeding.
- ----------------
-
- TbWeeder is a utility to weed out duplicate files.
-
- Virus researchers and collectors often receive large virus collections which
- contain many duplicate files. Not all anti-virus vendors use the same virus
- naming conventions, and often a virus sample is renamed to match to the
- name printed by the scanner used to identify the virus. These renamed
- files are copied into other collections, causing many renamed but equal
- files floating around in all kind of virus collections.
-
- TbWeeder can help to identify duplicate files, and automatically delete them.
-
- Duplicate files are files with the same 32-bit CRC and length. To be
- absolutely sure, TbWeeder will perform a full match - byte by byte - of
- the files if both files are available.
-
- TbWeeder can also maintain a database so it is not necessary to rescan
- all files over and over again to search for duplicates. however it must be noted
- that the files dirs listed in the .lst files may not match. I recommend a batch
- file (more about that later) be run on each dir after placing the files in
- their proper dirs (alpha or Family etc) (from the NEWVIRS dir indicated below)
-
-
- Interesting features
- --------------------
-
- - The amount of files TbWeeder can handle is 65534
-
- - TbWeeder can optionally delete duplicate files
-
- - TbWeeder can be used to compare and weed files from one path against
- another path, but also to compare and weed within a single path.
-
- - TbWeeder accepts filename specifications, so it can be used to
- check just one file against a huge collection.
-
- - TbWeeder can maintain two databases, one for the CRC and length
- information, another one for the names of the files in the database.
- To weed out remotely, the relatively small CRC database is sufficient.
-
- - TbWeeder is able to compare files byte for byte for additional security.
-
- - TbWeeder is able to output a report file with all duplicate files.
-
- - TbWeeder is very fast (due to a 128Kb hash table and nifty linked lists!).
-
- - TbWeeder however is not network aware, you must bring down a your network
- if you are running one prior to execution.
-
- Intended purpose
- ----------------
-
- Example 1:
- Suppose you have a virus collection in directory C:\MYVIRS with viruses
- sorted out. In directory C:\NEWVIRUS you receive new virus samples.
- Enter:
- TbWeeder c:\MyVirs /add
- This causes TbWeeder to generate a database with file information.
- To find out which viruses in directory C:\NEWVIRS are duplicates, execute:
- TbWeeder c:\NewVirs
- You can optionally put all duplicate files in a log file by using option /log
- or automatically delete the duplicates by using option /del.
-
- Example 2:
- Suppose you have a directory VIRUSES and you want to delete all duplicates.
- Enter:
- TbWeeder Viruses /add /del
- This causes TbWeeder to build a database and delete duplicate files at the
- same time!
-
- Example 3:
- Suppose you want to know whether viruses from someone else's collection
- are the same ones you have or rather if he s got any you dont have. Run
- TbWeeder on your own collection (see below for examples).
- and distribute TbWeeder.exe and your TbWeeder.Dat file and these docs to the
- owmer of the other collection. TbWeeder can now be used to see if they have
- any files you don't have. mabey they'd exchange some of them for some you
- have that they don't have.
-
-
- The database (tbweeder.lst)
- ------------
-
- TbWeeder can only be used with an external database, due to the excessive
- amount of data it has to handle when comparing a file against 65000 others!
-
- TbWeeder.Dat will contain the 32-bit CRC and length of all files. This
- information is usually sufficient to find out whether a file is a duplicate
- or not. To become completely sure, TbWeeder can also perform a byte for byte
- comparison after it thinks that two files are identical. However, in this
- case TbWeeder needs the name of the original file and of course the original
- file itself. Therefore TbWeeder will also maintain a name reference, named
- TbWeeder.Lst. This file can become quite large, several megabytes is not
- unusual. If you don't want these extended features, you can save disk
- space by specifying option /noname.
-
- Since TbWeeder.Lst will become very large and will only be necessary to
- list the name of the first - original - file and to perform a byte by byte
- match,(assuming you have the other parties files you may choose not to
- distribute this file to others. It can however be useful to distribute the
- other file, TbWeeder.Dat, to others, to weed out file remotely
- to avoid that people send you files you already have). The maximum size
- of TbWeeder.Dat is 512Kb (with over 65000 files!).
-
-
- Usage
- -----
-
- Usage:
- TbWeeder [<path>][<filename>] [<options>...]
-
- If no options are specified, the specified path will be scanned for
- duplicate files. TbWeeder will compare these files against the files
- in the TbWeeder.Dat database, and against the files in the specified
- path itself.
-
- #-> IF THERE IS NOT ALREADY A DATABASE YOU NEED TO SPECIFY OPTION /ADD
-
- Command line options (abbreviations between brackets).
- help (h) displays a help file.
-
- nosub (s) do not process sub directories.
-
- add (a) The files which have been found to be unique will
- be stored in the database files.
-
- del (d) delete duplicate files.
-
- noname (n) do not create or consult the large name reference
- database. This will disable the full byte by byte
- comparison as well.
-
- log (l) log duplicate files
-
- To use tbweeder to check for files that you have your friend might be
- interested in using your collection of files against another
- collection's tbweeder.dat, you can follow the following example, Please do
- the following:
-
- In this section copy means to create an additional complete copy of your files.
- This program is going to delete files in the dirs specified.
-
- 1. -Copy- or unzip All of your .EXE and .COM virus files into a
- directory. (or you can have them in separate dirs if you like)
- (Lets say you copy them into a directory called c:\virus
- (or live123,livea-b,livec-d,livee-f,liveg-h etc). leaving your files
- intact. Use these backup dir(S) for your proccessing.
- (for an example see the 1st batch file below if you have separate dirs)
-
- 2. Copy the TBWEEDER.EXE and the OTHER persons TBWEEDER.DAT file into A
- directory (lets say c:\tbweeder) (making sure your tbweeder.dat if you have one already
- file is safely moved to a backup dir).
-
- 3. Go to the TBWEEDER directory and type TBWEEDER c:\virus(or filenames) /del
-
- This could delete the bulk of the files (if its a large .dat
- collection) these deleted files represent the files the other collection
- all ready has. Leaving the files that the other collection does not have in it.
- Now you can zip these files that are left and send them to that person
- if you desire to upgrade his collection with all the files you have that
- he doe's not have . In the case of multiple dirs just repeat steps for each dir
- Example
-
- c:\tbweeder
- tbweeder d:\live123 /del
- tbweeder d:\livea-b /del
- tbweeder d:\livec-d /del
- tbweeder d:\livee-f /del
- tbweeder d:\liveg-h /del
- tbweeder d:\livei-j /del
- tbweeder d:\livek-l /del
- tbweeder d:\livem-n /del
- tbweeder d:\liveo-p /del
- tbweeder d:\liveq-s /del
- tbweeder d:\livet-u /del
- tbweeder d:\livev /del
- tbweeder d:\livew-x /del
- tbweeder d:\livey-z /del
- tbweeder d:\unscans /del
-
- If you do this Its possible that he would do the same for you.
-
- Now destroy his tbweeder.dat file and recopy your backup .dat file into
- the tbweeder dir.
-
- If you want to clean your files from dupes just run it on all of your
- directorys TBWEEDER d:\yourvirusdir#1 /add /delete
- Example batch (if in seperate dirs)
- d:\
- del tbweeder.dat : The following is in this batch file because
- del tbweeder.lst : you absolutely do not want to run the
- cd tbweeder : the tbweeder program twice on the same dirs
- copy tbweeder.dat d:\ : without having first deleted the .dat file
- copy tbweeder.lst d:\ : WHY? IT'LL DELETE ALL OF YOUR FILES!!!
- del tbweeder.dat : I put in a backup .dat just in case it
- del tbweeder.lst : were to get corupted.
- tbweeder d:\live123 /add /del
- tbweeder d:\livea-b /add /del
- tbweeder d:\livec-d /add /del
- tbweeder d:\livee-f /add /del
- tbweeder d:\liveg-h /add /del
- tbweeder d:\livei-j /add /del
- tbweeder d:\livek-l /add /del
- tbweeder d:\livem-n /add /del
- tbweeder d:\liveo-p /add /del
- tbweeder d:\liveq-s /add /del
- tbweeder d:\livet-u /add /del
- tbweeder d:\livev /add /del
- tbweeder d:\livew-x /add /del
- tbweeder d:\livey-z /add /del
- tbweeder d:\unscans /add /del
-
- This will create a clean dat file and .lst file showing exactly where your
- files are located.
-
- OR after you have your .DAT file made you can clean dupes in your directorys
- by The same process and the batch as shown will delete the old .dat file
- prior to running and save a backup copy of the .dat file and then run the dupe
- check process.
-
- IF you get a collection in from someone else you can unzip them into a
- directory called c:\newvirus and type TBWEEDER c:\newvirus /DEL
- (using your current .dat file). This will delete all of the newfiles
- that you already have. Leaving only the files that you do not have in your
- collection. You can then run your scanners on them and move them to the
- proper dirs where they are to reside and then you type the above batch file
- again. (it only takes a few minutes mabey 5 or 10 at the most.) .As stated in
- the last paragraph you may omit the /del switch at this point. To creat
- a clean .dat file and .lst file.
-
- This program does not support NETWORKS so bring your net down before
- executing this program. also it doesn't work if you put the other collections
- .lst file in your tbweeder dir. You can use the /noname option and generate
- a simple .dat file to deliver to you r friends, but some have reported error
- using this method.
-
- The Original Docs in the program suck. So I wrote this to help you understand
- this little known program better, it is a VERY powerful program and we
- encourage its use and distribution. We use it regularly.
-
- WE ALso encourage the exchange of virus and are here to exchange with you
- if you desire. We will also give you better access on our research
- bbs as a major contributor. You know were we can be contacted. (anywhere
- critters can be found) ttyl