home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.programming
- Path: sparky!uunet!pipex!harlqn.co.uk!harlqn!markt
- From: markt@harlqn.co.uk (Mark Tillotson)
- Subject: Re: Locating duplicates
- In-Reply-To: kpv@ulysses.att.com's message of Tue, 5 Jan 1993 15:00:10 GMT
- Message-ID: <MARKT.93Jan7122729@wundt.harlqn.co.uk>
- Lines: 29
- Sender: news@harlqn.co.uk (Usenet News Account)
- Organization: Harlequin Limited, Cambridge, England
- References: <C07FJo.4vq@ux1.cso.uiuc.edu> <1993Jan2.041658.21596@organpipe.uug.arizona.edu>
- <1993Jan5.150010.27104@ulysses.att.com>
- Date: Thu, 7 Jan 1993 12:27:25 GMT
-
- In article <1993Jan5.150010.27104@ulysses.att.com> kpv@ulysses.att.com
- (Phong Vo) writes:
-
- > In article <1993Jan2.041658.21596@organpipe.uug.arizona.edu> dave@cs.arizona.edu (Dave Schaumann) writes:
- > >In article <C07FJo.4vq@ux1.cso.uiuc.edu>, ceblair@ux1 (Charles Blair) writes:
- > >> Given a large array of integers. Is there a way to identify duplicates
- > >>which is substantially less work than sorting the array?
- > >
- > >...lots of good suggestions for the general case...
- >
- > But most often this kind of question comes up in practice
- > because you have a large array of integers in a small range.
- > In that case, using a bit map for the range, you can do it
- > in linear time and space proportion to the range size.
-
- Use a hash table from integers to the number of occurrences, then you
- do N hash-lookups, and one scan of the hash table looking for
- duplicates, giving an expected time O(N), with no bound on the range
- of integers (other than word-size, and address-space!). You need a
- suitable hash function, but for integers taking the modulo w.r.t. some
- prime p, where p is about the size of the incoming array.
-
-
- ------------------------------------------------------
- |\ /| | , M. Tillotson Harlequin Ltd. \
- | \/ | /\| |/\ |< markt@uk.co.harlqn Barrington Hall,\
- | | \_| | | \ +44 223 872522 Barrington, \
- I came, I saw, I core-dumped... Cambridge CB2 5RG \
-
-