home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!pipex!warwick!uknet!mcsun!corton!loria!loria.crin.fr!eker
- From: eker@loria.crin.fr (Steven Eker)
- Newsgroups: comp.sys.atari.st.tech
- Subject: Re: Cross Compiling Sozobon 2.0
- Keywords: hash function
- Message-ID: <567@muller.loria.fr>
- Date: 5 Nov 92 17:07:43 GMT
- References: <1992Nov2.133525.7992@sol.ctr.columbia.edu> <1992Nov3.140438.19093@news.lrz-muenchen.de>
- Sender: news@news.loria.fr
- Organization: CRIN (CNRS) Nancy - INRIA Lorraine
- Lines: 54
-
- In article <1992Nov3.140438.19093@news.lrz-muenchen.de>, HUSFELD@usmv01.usm.uni-muenchen.de (Husfeld, Dirk) writes:
- |> In <1992Nov2.133525.7992@sol.ctr.columbia.edu> richard@op.ph.ic.ac.uk writes:
-
- [discussion of what hashing is supposed to be deleted]
-
- |> >
- |> > hash(s)
- |> > unsigned char *s;
- |> > {
- |> > unsigned char c;
- |> > register i, n = 0;
- |> >
- |> > for (i=1; i<5; i++) {
- |> > c = s[i];
- |> > if (!c)
- |> > break;
- |> > n = (n<<1)|c;
- |> > }
- |> > return n & (NHASH-1); /* NHASH is the max value */
- |> > }
- |> >
- |> >
- |> > I've spotted that this routine only uses the 1st to fifth letters of
-
- Actually 1st -> 4th letters.
-
- |> > the name (skipping the first underscore), and that this is the same
- |> > for both __initial_stack and __init_signal, but increasing the number
- |> > doesn't help - in fact the routine only seems to return hash values
- |> > of 46,61,62,63 whatever is passed through to it when it uses seven
- |> > characters.
-
- This is a truly _awful_ hash function; simply ORing and shifting is
- going to produce a heavy bias towards hash values with lots of ones -
- especially in the more significant bits:
- 46=%101110, 61=%111101, 62=%111110, 63=%111111
- (one can guess that NHASH=64)
-
- Replace the OR (|) with addition (+) or exclusive-OR (^) for some improvement.
-
- |> >
- |> One might argue over the sensibility of a hash function that only looks
- |> at the first characters of a name. It is often a bad approach because
- |> some programmers enumerate their variables (like file1, file2,...;
- |> not recommended), others use a verb-object template for their function
- |> names (get_this(), get_that(), get_next()...). Nevertheless, a bad hash
- |> function only slows down the calling code because it has to do more work
- |> to solve the name clash.
-
- Agreed. I believe the Sozobug loader only deals the first 8 characters
- of a name so I'd hash on 1 thu 7 (starting at zero!);
- i.e. replace the 5 with an 8 in the above code.
-
- Steven
-