NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / programm / 3418 < prev next >

Wrap

Text File | 1993-01-06 | 2.1 KB | 57 lines

Path: sparky!uunet!gatech!destroyer!ncar!noao!amethyst!organpipe.uug.arizona.edu!news From: dave@cs.arizona.edu (Dave Schaumann) Newsgroups: comp.programming Subject: Re: Locating duplicates Message-ID: <1993Jan5.224141.21639@organpipe.uug.arizona.edu> Date: 5 Jan 93 22:41:41 GMT References: <1993Jan2.041658.21596@organpipe.uug.arizona.edu> <DOUGM.93Jan2014723@titan.cs.rice.edu> <C07z6L.1Aq@world.std.com> <1993Jan5.043456.29130@micor.ocunix.on.ca> Sender: news@organpipe.uug.arizona.edu Reply-To: dave@cs.arizona.edu (Dave Schaumann) Organization: University of Arizona Lines: 43 In-Reply-To: goss@micor.ocunix.on.ca (Timothy D. Goss) In article <1993Jan5.043456.29130@micor.ocunix.on.ca>, goss@micor (Timothy D. Goss) writes: >You can locate ALL duplicates in a list of integers with a single pass of the >list which is O(n) in time but it will require an array with 64K elements. The >program below should explain... > >typedef unsigned char byte; > >static byte found[65536]; >int intlist[NUM]; > >void main(void) { > > int i; int check; > > /* assume intlist is filled with NUM integers */ > > for (i=0; i<NUM; i++) { > check = intlist[i]; > if (found[check] > 0 && found[check] != 0xff) { > printf("%d\n", check); > found[check] = 0xff; /* don't report same dupe twice */ > } else { > found[check] += 1; /* first time we saw this one */ > } > } > >} This is fine if you know all of your values are between 0 and 65536. A slightly more robust (but still linear time) algorithm could be written which would only require that the difference between the maximum and minimum elements is 65536. Also, if you use a bit-vector instead of an array of chars for "found", you get an 8x space improvement at the cost of a small amound of bit twiddling. Hmmm. An even more robust algorithm would scan through the array to find the difference between the largest and smallest elements, and then malloc() an appropriate sized array. Of course, if malloc() returns NULL, you'll have to fall back to some kind of in-place sorting scheme. -- Dave Schaumann dave@cs.arizona.edu