home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.lang.pascal
- Path: sparky!uunet!spool.mu.edu!agate!rsoft!mindlink!a499
- From: Robert_Salesas@mindlink.bc.ca (Robert Salesas)
- Subject: Re: Pattern Matching for Spelling Correction
- Organization: MIND LINK! - British Columbia, Canada
- Date: Thu, 7 Jan 1993 10:21:32 GMT
- Message-ID: <19332@mindlink.bc.ca>
- Sender: news@deep.rsoft.bc.ca (Usenet)
- Lines: 45
-
- > Tim Ciceran writes:
- >
- > Msg-ID: <1993Jan7.054651.25174@spartan.ac.BrockU.CA>
- > Posted: Thu, 7 Jan 1993 05:46:51
- >
- > Org. : Brock University, St. Catharines Ontario
- >
- > I'm currently working on the front-end to an information retrieval
- > package and would like to incorporate a facility to provide for
- > spelling correction. Does anyone have any recommendations for an
- > efficient pattern matching algorithm which could be used in this
- > capacity?
- >
- > The dictionary is rather large (but static) and I would prefer having
- > the option to search the entries either on disk or through a table
- > in memory.
- >
- > Any insights or references would be especially helpful.
- >
- > Thanks in advance,
- >
- > TMC
- >
-
-
-
- I too would be interested in hearing ideas. I'd like to get a spell checker
- with suggestions working, with a 120000 word dictionary (have it).
- Obviously, the way to proceed is with a hash table, and soundex or metaphone
- codes for matching. However, the calcs I made end up with a table of 720
- 000Kb, not to mention the dictionary itself! (6 bytes per word - hash key,
- metaphone key, and index to word). It isn't so obvious any more. How does
- one go about this? Compression does do much on quasi random data, although I
- can get very good string repitition compression on the dictionary itself.
- Any ideas?
-
-
- Rob
-
- --
- \------------------------------------------------------------------------/
- \ Robert Salesas | Internet: Robert_Salesas@mindlink.bc.ca /
- \ Eschalon Development Inc. | CIS: 76625,1320 Tel/Fax: 604-520-1543 /
- \------------------------------------------------------------------------/
-
-