home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!mcsun!sun4nl!and!jos
- From: jos@and.nl (Jos Horsmeier)
- Newsgroups: comp.programming
- Subject: Re: Soundex algorithms, database indexing
- Keywords: soundex, databases, indexing
- Message-ID: <3166@dozo.and.nl>
- Date: 24 Jul 92 12:20:17 GMT
- References: <1992Jul22.122831.27758@cbnews.cb.att.com>
- Organization: AND Software BV Rotterdam
- Lines: 55
-
- [ I've tried to mail you, but it bounced, so I post it here ... ]
-
- In article <1992Jul22.122831.27758@cbnews.cb.att.com> Marc@cbnews.cb.att.com writes:
- |Hello programmers/researchers,
-
- [ names stored in a database ... ]
-
- | For the latter one I want to set up a soundex key as well. I have heard of the soundex algorithm but never seen any pratical applications.
-
- | I would like to get info/ a 'C' program (function) that makes use of
- | the Soundex algorithm.
-
- | Also any information on using index files (for fast searching) or
- | 'C' code would be greatly appreciated.
-
- Basically, the Soundex algorithm goes like this:
-
- - Chop the first character from the name
-
- - Map all the letters according to the following table:
-
- - b, f, p, v ---> 1
- - c, g, j, k, q, s, x, z ---> 2
- - d, t ---> 3
- - l ---> 4
- - m,n ---> 5
- - r ---> 6
- - a, e, h, i, o, u, w, y ---> 7
-
- - Where ever a sequence of the equal digits occur, remove them all but
- the first of the sequence
-
- - Optionally, chop the first digit from the number
-
- - Optionally, remove all sevens (all vowels)
-
- - Add some 0's at the end (dependent on the number `n' in the next step)
-
- - Take the first n digits and prepend the letter from the first step.
-
- The table, given in step 2 is highly language dependent of course,
- e.g. in Dutch a dental `t' might be pronounced as a dental `s' and
- certain combinations of vowels result in different diphtongues.
-
- So, transform all names into their Soundex form and compare the
- two results for equality. That's all there is to it ...
-
- About your question on indexes: I'd suggest a hashing scheme if it's
- unlikely that records will be removed from the database frequently.
- On the other hand, a B-tree will do fine too here.
-
- kind regards,
-
- Jos aka jos@and.nl
-
-