home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!cs.utexas.edu!swrinde!sdd.hp.com!spool.mu.edu!wupost!usc!enterpoop.mit.edu!eru.mt.luth.se!lunic!sunic!corax.udac.uu.se!corax.udac.uu.se!perm
- From: perm@csd.uu.se (Per Mildner)
- Newsgroups: comp.programming
- Subject: Re: Locating duplicates
- Message-ID: <PERM.93Jan5210346@groucho.csd.uu.se>
- Date: 5 Jan 93 20:03:46 GMT
- References: <C07FJo.4vq@ux1.cso.uiuc.edu>
- Organization: Computing Science Dept.,Uppsala University, Sweden
- Lines: 20
- NNTP-Posting-Host: groucho.csd.uu.se
- In-reply-to: ceblair@ux1.cso.uiuc.edu's message of 2 Jan 93 02:13:58 GMT
-
-
- A general solution to this problem (ie not dependent on the values
- being integers) can be found in an fairly recent CACM article (I don't
- have it handy, probably first half of 1992, you probably could use
- WAIS to find it... ).
-
- From memory:
-
- It uses hashing to compute a 'home' for a particular key in the
- vector. It scans the vector while moving keys to their 'home' location
- (and checking if the value at the 'home' is a duplicate or else do
- some clash handling). I don't remember the various performance
- characteristics for the algorithm but I think one of the good things
- was that it was useful on disk based vectors (ie to large to be sorted
- in memory).
-
- --
- Per Mildner perm@CSD.UU.SE
- Computing Science Dept. tel: +46 18181049
- Uppsala University, Sweden fax: +46 18521270
-