home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!news.centerline.com!noc.near.net!news.Brown.EDU!qt.cs.utexas.edu!cs.utexas.edu!wupost!emory!uumind!willard!dawson
- From: dawson@willard.UUCP
- Newsgroups: comp.bbs.waffle
- Subject: Re: News/mail reader
- Message-ID: <Dm2BuB2w165w@willard.UUCP>
- Date: Mon, 16 Nov 92 07:52:36 EST
- References: <u0g0TB5w165w@iowegia.uucp>
- Organization: Willard's House BBS, Atlanta, GA -- +1 (404) 664 8814
- Lines: 249
-
- kjhoule@iowegia.uucp (Kevin Houle) writes:
-
- > dell@Apple.COM (Thomas E. Dell) writes:
- >
- > > Also something that will *probably* be in 1.66, will be rejection
- > > of duplicate articles (making retrieval by Message-ID possible.)
- >
- > This would be nice, Tom. Myself and a handful of local sites exchange
- > "dsm" and "ia" newsgroups. We want to achieve some network redundancy,
- > which requires being able to weed out duplicate articles.
- >
- > For the time being, I've written a program that does just that. Given a
- > root directory on the command line, the program recurses through the
- > root and sub-directories and deletes duplicate articles based on the
- > Message-ID header. As duplicates are removed, the filenames are
- > resequenced in ascending order starting with the lowest original
- > article. If anyone is interested, let me know.
- >
- > --
- > Kevin Houle kjhoule@iowegia.uucp
-
- As some of you may remember, I've likewise been ~dealing~ with duplicate
- messages, for quite some time. My original attempt, ANTIDUPE, was made
- available to the net a while back. Several weeks ago I released another
- utility, which I called HISTORY. Well, I've continued my development
- efforts on that. Someone pointed out that 4DOS already has a built-in
- 'history' command, so, following the example set by others, I've renamed
- it WAFHIST. The WAFHIST program uses a dBase database to manage messages.
- In addition to removing dupes, it will also expire articles.
-
- If anyone wishes to receive the latest, greatest, most up-to-date version,
- drop me a reply. It's kinda big:
-
- c:\wafhist> wc c:/wafhist/*.uue
- 956 973 59053 c:/wafhist/wafhist1.uue
- 955 970 59029 c:/wafhist/wafhist2.uue
- 955 970 59028 c:/wafhist/wafhist3.uue
- 955 970 59028 c:/wafhist/wafhist4.uue
- 955 970 59029 c:/wafhist/wafhist5.uue
- 955 970 59029 c:/wafhist/wafhist6.uue
- 873 895 53922 c:/wafhist/wafhist7.uue
- 6604 6718 408118 total
-
- Oh, did I not say? It's very fast.
- ---------------------------------------------------------------------------
- From WAFHIST.DOC:
-
- Program: WAFHIST
-
- Usage: WAFHIST [-soft|-hard] [-init] [-control] [-verbose]
- [-noerase] [-noindex] [-version]
- [-waffle=path] [-history=path] [-spool=path]
- [-type=USENET|-type=LOCAL] [-batch=filespec]
- [-expire=#days] [-debug=filespec]
-
- All the -options are exactly that: optional.
-
- Defaults are -soft, and -waffle derived from the WAFFLE environment
- variable. The default function of WAFHIST is to build or update a
- history file for use in performing control actions and expirations;
- also, duplicate messages are removed from subdirectories that contain
- more than one instance of a message.
-
- Compiler: Clipper 5.01
- Libraries: Funcky 5.0 libraries for Clipper.
- Filename: WAFHIST.PRG
-
- Author's addresses:
- Willard F. Dawson
- 10470 Haynes Bridge Road
- Alpharetta, GA 30202-5033
- Voice: (404) 664-4935 / (404) 303-2343
- willard.uucp: (404) 664-8814 :: Waffle 1.65, 24 hours/day
- gatech.edu!vdbsan!willard!dawson
- emory.edu!slammer!willard!dawson
- X.400: </G=W/S=DAWSON/O=BSAN/ADMD=BELLSOUTH/C=US/@sprint.com>
-
- COMMAND LINE OPTIONS
- Most of the command line options may be combined in any sequence, except
- that a couple are toggle switches (-soft/-hard and -type=USENET/LOCAL).
- You can specify conflicting options on the command line, but the last
- option takes precedence. That is, the command line is scanned from left
- to right.
-
- -soft
- This is the default level of operation. Soft means that the history
- database is searched for a current occurence of the directory/filename.
- If the file is already referenced with a live (unexpired) entry, then
- no further work is done with this filename. This mode of operation
- assumes that all live entries in the database are valid, and that no
- one has subsequently replaced the file with another, possibly invalid
- (or un-referenced) message.
-
- As long as all expiration is done from within this program, this is
- probably a valid assumption. Caveat emptor.
-
- -hard
- A hard search of the database will be made. This means that a file
- is opened, and it's Message-ID: found. Once a Message-ID has been
- obtained, that is used to query the database if another file is show
- to have the same Message-ID, then this file will be deleted.
-
- (This form of search is disk intensive. Really.)
-
- -init
- Forces WAFHIST to delete and rebuild the history database.
-
- * -control
- Adds control messages not already in database, then proceeds to
- execute cancels, newgroups, and rmgroups. This option is not yet
- supported. Perhaps in the near future...
-
- -verbose
- Causes WAFHIST to report on significant actions, e.g., file erasure.
-
- -noerase
- Causes WAFHIST to skip file erase operations. Use this in combina-
- tion with -verbose to have WAFHIST tell you what it would otherwise
- be doing to your news spool. So, you can test the program without
- committing to the hilt.
-
- -noindex
- Causes WAFHIST to use "LOCATE FOR" syntax rather than "SEEK".. the
- net result of which is that there is no need for the *.NTX index
- files, and they are consequently not built. The savings in file
- space is significant, though the effect on overall runtime speed
- is sure to be dire. (* Note: This is the most recently added
- option, and is therefore the least tested portion of the program.
- Caveat emptor, eh!?)
-
- -version
- Report the current software version and exit.
-
- -waffle=
- Points to Waffle's static file. The default action is to find the
- static file by reading the DOS environment variable WAFFLE. This
- option allows you to use an alternative static file.
- Example usage:
- WAFHIST -waffle=c:\waffle\system\static
-
- -history=
- Points to history database directory. This is the directory which
- is used by WAFHIST for creating the history database (WAFHIST.DBF)
- and index (WAFHIST.NTX, and MESSAGID.NTX) files. If left unspecified,
- the default directory is C:\WAFHIST.
- Example usage:
- WAFHIST -history=c:\waffle\admin
-
- -spool=
- Points to Usenet spool directory. If left unspecified, this info
- is derived from the STATIC file (as pointed to by either the WAFFLE
- environment variable or the -waffle= option).
- Example usage:
- WAFHIST -spool=c:\news
-
- -batch=
- Points to a batch specification file. See the accompanying MS-DOS
- batch program file PREBATCH.BAT for an example of usage of this
- option.
-
- [ Here is a description of the algorithm that is used here:
-
- for each line in batch_spec_file do
- search for corresponding entry in history database
- if an entry is not found
- search for existing entry with the same Message-ID header
- if an entry is not found
- add an entry for this message into the history db.
- else (an entry with the same Message-ID already exists)
- delete the current line from the batch_spec_file
- delete the corresponding file from the news spool
- endif
- endif
- endfor
- ]
-
- Example usage:
- WAFHIST -batch=c:\spool\comp16\test
-
- -expire=
- Messages older than n days are expired (deleted) from the news spool.
- Example usage:
- WAFHIST -expire=14
-
- * -purge=
- Forces WAFHIST to delete info for messages that were cancelled n days
- ago. The reason this info is not deleted as the messages themselves
- are cancelled is so that those messages do not (incorrectly) get re-
- entered into the database if they get re-fed (for whatever reason).
- This function is not yet implemented; neither is there support yet
- for the cancel control message.
- Example usage:
- WAFHIST -purge=14
-
- -type=
- Specifies one of LOCAL or USENET. The default is BOTH if unspecified.
- Only one of either LOCAL or USENET may be specified.
- Example usage:
- WAFHIST -type=USENET
-
- -debug=
- Secondary output (to file) for screen output, just for debugging.
- (The screen output of Clipper files is in general NOT redirectaable
- to files using > or >>. [ I KNOW, yech! ].)
- Example usage:
- WAFHIST -debug=c:\wafhist\wafhist.dbg
- ---------------------------------------------------------------------------
- From UPDATE.DOC:
- Mon Oct 5 07:10:15 EDT 1992 Version 0.10
-
- Fixed bug with piece of code used to detect whether the history.exe file
- was newer than the history.dbf file.
- ----------------------------------------
- Sun Oct 25 21:34:27 EDT 1992 Version 0.11
-
- Fixed bug that incorrectly left news article file open after detecting that
- no Message-Id header exists.
-
- Added new features, including a status.dbf database file, altered expiration
- function to be based on number of days since the article arrived, and enhanced
- the operation of the new file scan. No file is scanned unless it arrived after
- the last scan, or unless it is older than the expiration date (if the expire
- option is specified).
- ----------------------------------------
- Sun Nov 15 13:08:12 EDT 1992 Version 0.12
-
- Changed operation of -batch option, so that recent change (to only scan a
- file if its creation date/time is prior to the last scan date/time) is not
- in effect. That is, if the -batch option is specified, all the files in
- the given specification file will be examined.
-
- Changed the name of this utility program from HISTORY to WAFHIST, for a
- couple of reasons. For one, 4DOS has an internal history command, resulting
- in a name-space collision. For another, WAFHIST more clearly shows that
- this utility is meant to be used with Waffle, in the same sense as Wafmail
- and other utilities.
- ----------------------------------------
- Sun Nov 15 16:37:28 EDT 1992 Version 0.13
-
- Added support for "LOCATE" searches of the database via the -noindex option,
- enabling those with little disk space to use the program, as the *.NTX index
- files are then unnecessary. This is offered as a configurable option, so
- that those with space to spare can continue to use the index files, as the
- "LOCATE" form of dBase file search is time-intensive, and generally to be
- avoided when at all possible.
-
- --
- dawson@willard.UUCP (Willard Dawson)
- Willard's House BBS, Atlanta, GA -- +1 (404) 664 8814
-