home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!micro-c!eastwind!chorn
- From: chorn@eastwind.mcds.com (Christopher Horn)
- Newsgroups: comp.bbs.waffle
- Subject: Re: dupweed
- Summary: What I did in it's place
- Message-ID: <10aLuB1w165w@eastwind.mcds.com>
- Date: Sat, 21 Nov 92 07:59:35 EST
- References: <D61JuB1w165w@tcscs.UUCP>
- Organization: The East Wind +1 201 875 7063
- Lines: 43
-
- tcscs!zeta@src.honeywell.com (Gregory Youngblood) writes:
-
- > jim@jimmc.chi.il.us (Jim McNicholas) writes:
-
- [some deleted]
-
- > > Problem though, isn't it supposed to find duplicate articles from the root
- > > of news, no matter if they are posted elsewhere of not! right now I can
- > > delete dupes from news\rec\foo and news\rec\bar but if something is
- > > cross-posted to comp.foo or comp.bar I'm fubar for sure!! Kevin any help
- > > available and no I'm not a programmer!! Just a waffle head, I guess I could
- > > try writing bat files for every conceivable combination but that wouldn't w
- > > either!! I can't get dupweed to see extended or expanded memory either, on
- > > the 320,000 or so I have left after waffle is running!!
-
- > I dont know waht happened when I used it, but it dleted an entire news direct
- > not just duplicate articles. When I have more time I'll look at it again,
- > but until then....
-
- I tried it too, and had some problems. Most of it I suspect was due to
- stack overflow from the recurse routine. I started hacking it, but in
- the end decided to write my own code to do it from scratch.
-
- Another problem I noticed is that it's fine if you kill the dupes,
- * BUT * if you resequence BEFORE you batch, the filenumbers in the batch
- queues will fail to match. Which is why Willard Dawson took the approach
- he did with WafHist.
-
- My solution is to kill the dupes, batch, and then go back and resequence.
- Currently the dupe kill routine in my code assumes you get all your news
- from a single upstream site. A painful restriction, but the result is some
- VERY VERY fast code. And my directory recurse function uses very little
- stack space, with no limit on subdirectories, etc.. It can handle 3000
- files per directory, probably closer to 5000 if one wanted it too. I'm
- currently trying to decide how to effectively clean dupes when getting
- multiple feeds, as this does require every article file be opened since
- identicle articles may have come via different paths. If anyone is
- interested, let me know and I'll either post it or an announcement of
- where/how to get it when I'm done.
-
- ---
- Christopher Horn | "We're all caught in a state of decay..."
- chorn@eastwind.mcds.com | The East Wind +1 201 875 7063
-