NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / programm / 3437 < prev next >

Wrap

Internet Message Format | 1993-01-08 | 1.5 KB

Path: sparky!uunet!gatech!ncar!noao!amethyst!organpipe.uug.arizona.edu!news From: dave@cs.arizona.edu (Dave Schaumann) Newsgroups: comp.programming Subject: Re: Locating duplicates Message-ID: <1993Jan8.051051.18788@organpipe.uug.arizona.edu> Date: 8 Jan 93 05:10:51 GMT References: <C07FJo.4vq@ux1.cso.uiuc.edu> <1993Jan2.041658.21596@organpipe.uug.arizona.edu> <1993Jan5.150010.27104@ulysses.att.com> <MARKT.93Jan7122729@wundt.harlqn.co.uk> Sender: news@organpipe.uug.arizona.edu Reply-To: dave@cs.arizona.edu (Dave Schaumann) Organization: University of Arizona Lines: 18 In-Reply-To: markt@harlqn.co.uk (Mark Tillotson) In article <MARKT.93Jan7122729@wundt.harlqn.co.uk>, markt@harlqn (Mark Tillotson) writes: >Use a hash table from integers to the number of occurrences, then you >do N hash-lookups, and one scan of the hash table looking for >duplicates, giving an expected time O(N), But what happens when the number of elements to be scanned is much larger than the number of buckets you choose? You get collisions up the wazoo, and then your algorithm is suddenly O(N*f(N/#buckets)), where your collision resolution algorithm is O(f(N/#buckets)). If you choose the number of buckets at run-time based on the number of elements, you might as well just use the "radix sort" solution already posted. -- You're traveling through another dimension -- a dimension not only of sight and sound but of mind. A journey into a wonderous land whose boundaries are that of imagination. That's a signpost up ahead: your next stop -- the Twilight Zone!