home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!mcsun!sun4nl!and!jos
- From: jos@and.nl (Jos Horsmeier)
- Newsgroups: comp.programming
- Subject: Re: Help With String Comparisons
- Message-ID: <4340@dozo.and.nl>
- Date: 12 Jan 93 10:09:46 GMT
- References: <1993Jan11.224515.24729@dragon.acadiau.ca>
- Organization: AND Software BV Rotterdam
- Lines: 43
-
- In article <1993Jan11.224515.24729@dragon.acadiau.ca> 890584r@dragon.acadiau.ca (Barrie Rody) writes:
-
- | I am writing a database and I need some help in comparing strings.
- |What I need is an algorithm to compare 2 strings {PChar, array[]of Char}.
- |I need this algorithm to be able to identify similar strings.
- |ie. a misspelled word would be found in a list.
- | I am writing this in Turbo Pascal for Windows. Any pseudocode,
- |Pascal, C, C++ or text answers would be greatly appreciated.
-
- Two approached come to mind:
-
- - Soundex matching:
-
- 1) Map the consonants to their respective classes:
-
- - gutturals: g, h, k, q, r, x
- - dentals : c, d, s, t, z
- - labials : b, f, j, l, m, n, p, v, w
-
- 2) Remove identical adjacent phonemes and compare the two resulting strings
-
- - `Gestalt' matching (function P):
-
- 1) Given two strings V and W, find the largest substring X, such that
- V= Vl X Vr and W = Wl X Wr. In case of a tie, take one X at random.
- Let S equal 2*|X| (the length of the largest substring X)
-
- 2) Let L be the value of P applied to Vl and Wl (if both not empty)
- and let R be the value of P applied to Vr and Wr (if both not empty)
-
- 3) The 'match value' is (L+S+R)/(|V|+|W|)
-
- An example may clarify the second method a bit. Consider two strings
- `color' and `colour'. The largest substring is `colo'. There are no
- prefix substrings and the two suffix substrings are `r' and `or'.
- Applying the same method to those strings yields the value 1 (`r').
- So, the total `match value' is (2*0+2*4+2*1)/(5+6) = 10/11.
-
- I hope this helps a bit,
-
- kind regards,
-
- Jos aka jos@and.nl
-