home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!gatech!destroyer!ncar!noao!amethyst!organpipe.uug.arizona.edu!news
- From: dave@cs.arizona.edu (Dave Schaumann)
- Newsgroups: comp.programming
- Subject: Re: Locating duplicates
- Message-ID: <1993Jan5.224141.21639@organpipe.uug.arizona.edu>
- Date: 5 Jan 93 22:41:41 GMT
- References: <1993Jan2.041658.21596@organpipe.uug.arizona.edu> <DOUGM.93Jan2014723@titan.cs.rice.edu> <C07z6L.1Aq@world.std.com> <1993Jan5.043456.29130@micor.ocunix.on.ca>
- Sender: news@organpipe.uug.arizona.edu
- Reply-To: dave@cs.arizona.edu (Dave Schaumann)
- Organization: University of Arizona
- Lines: 43
- In-Reply-To: goss@micor.ocunix.on.ca (Timothy D. Goss)
-
- In article <1993Jan5.043456.29130@micor.ocunix.on.ca>, goss@micor (Timothy D. Goss) writes:
- >You can locate ALL duplicates in a list of integers with a single pass of the
- >list which is O(n) in time but it will require an array with 64K elements. The
- >program below should explain...
- >
- >typedef unsigned char byte;
- >
- >static byte found[65536];
- >int intlist[NUM];
- >
- >void main(void) {
- >
- > int i; int check;
- >
- > /* assume intlist is filled with NUM integers */
- >
- > for (i=0; i<NUM; i++) {
- > check = intlist[i];
- > if (found[check] > 0 && found[check] != 0xff) {
- > printf("%d\n", check);
- > found[check] = 0xff; /* don't report same dupe twice */
- > } else {
- > found[check] += 1; /* first time we saw this one */
- > }
- > }
- >
- >}
-
- This is fine if you know all of your values are between 0 and 65536.
- A slightly more robust (but still linear time) algorithm could be written
- which would only require that the difference between the maximum and minimum
- elements is 65536.
-
- Also, if you use a bit-vector instead of an array of chars for "found", you
- get an 8x space improvement at the cost of a small amound of bit twiddling.
-
- Hmmm. An even more robust algorithm would scan through the array to find
- the difference between the largest and smallest elements, and then malloc()
- an appropriate sized array. Of course, if malloc() returns NULL, you'll
- have to fall back to some kind of in-place sorting scheme.
-
- --
- Dave Schaumann dave@cs.arizona.edu
-