NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / programm / 3427 < prev next >

Wrap

Internet Message Format | 1993-01-07 | 1.3 KB

Path: sparky!uunet!cs.utexas.edu!swrinde!sdd.hp.com!spool.mu.edu!wupost!usc!enterpoop.mit.edu!eru.mt.luth.se!lunic!sunic!corax.udac.uu.se!corax.udac.uu.se!perm From: perm@csd.uu.se (Per Mildner) Newsgroups: comp.programming Subject: Re: Locating duplicates Message-ID: <PERM.93Jan5210346@groucho.csd.uu.se> Date: 5 Jan 93 20:03:46 GMT References: <C07FJo.4vq@ux1.cso.uiuc.edu> Organization: Computing Science Dept.,Uppsala University, Sweden Lines: 20 NNTP-Posting-Host: groucho.csd.uu.se In-reply-to: ceblair@ux1.cso.uiuc.edu's message of 2 Jan 93 02:13:58 GMT A general solution to this problem (ie not dependent on the values being integers) can be found in an fairly recent CACM article (I don't have it handy, probably first half of 1992, you probably could use WAIS to find it... ). From memory: It uses hashing to compute a 'home' for a particular key in the vector. It scans the vector while moving keys to their 'home' location (and checking if the value at the 'home' is a duplicate or else do some clash handling). I don't remember the various performance characteristics for the algorithm but I think one of the good things was that it was useful on disk based vectors (ie to large to be sorted in memory). -- Per Mildner perm@CSD.UU.SE Computing Science Dept. tel: +46 18181049 Uppsala University, Sweden fax: +46 18521270