home *** CD-ROM | disk | FTP | other *** search
- Path: soap.news.pipex.net!pipex!usenet
- From: m.hendry@dial.pipex.com (Mathew Hendry)
- Newsgroups: comp.sys.amiga.programmer
- Subject: Re: "Fuzzy" string searching
- Date: Sun, 11 Feb 96 00:31:40
- Organization: Private node.
- Distribution: world
- Message-ID: <19960211.4F6A38.F17@am141.du.pipex.com>
- References: <311cf85e@beachyhd.demon.co.uk>
- NNTP-Posting-Host: am141.du.pipex.com
- X-Newsreader: TIN [AMIGA 1.3 950726BETA PL0]
-
- Adam@beachyhd.demon.co.uk wrote:
- : I wish to write a function that will perform a 'fuzzy' search for one string
- : within another.
- :
- : The most simple level of search is to just check if the 'search' string is in
- : the 'source' string. So if I'm searching for 'test' in the string 'this is a
- : test', it will obviously be found.
- :
- : But I need to go further than that, and provide more intelligent searching. If
- : I search for the word 'recent' in the string 'currency/recency', I still want a
- : very close match to be indicated.
- :
- : This becomes more difficult when searching (for example) for the word
- : 'workload' in the string 'high work-load'. Never more than 4 characters match
- : at any time, though obviously this should provide a very close match.
- :
- : The idea is similar to that used in spell-checkers.. When the program doesn't
- : recognise your word, it manages to search its dictionary for things it thinks
- : are very close.
- :
- : Does anyone have any suggestions or code that might allow me to perform this
- : sort of search? Ideally a function that I pass two strings to, and which
- : returns a score value within a given range (say, 0 to 100)..?
-
- Have a look at:
- AGrep204.lha util/sys 35K 155+Fastest available Grep (>1MB, >=2.0)
-
- which is a version of grep supporting inexact matching of substrings. When
- run, you are able to specify a number of character alterations / insertions /
- deletions to the search pattern when testing for a match.
-
- Source is not included in that archive, but I believe there are pointers to
- public source in the archive.
-
- As a side effect, AGrep is also the fastest grep around (someone based their
- PhD thesis around the searching algorithms used in it...).
-
- -- Mat.
-