home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
In'side Shareware 1995 March
/
ish0395.iso
/
win_util
/
dtsearch
/
stemming.dat
< prev
next >
Wrap
Text File
|
1994-06-01
|
2KB
|
79 lines
3+IES -> Y
3+ING ->
SS -> SS
3+S ->
4+ION ->
4+ISM ->
4+LY ->
3+EED -> EE
4+IED -> Y
4+ED ->
4+ER ->
4+NESS ->
4+FUL ->
4+ABLE ->
4+IBLE ->
3+V -> F
4+E ->
3+DD -> D
3+GG -> G
3+LL -> L
3+MM -> M
3+NN -> N
3+PP -> P
3+RR -> R
3+SS -> S
3+TT -> T
------------------------------------------------------------------
Customized Stemming
===================
Stemming rules vary from one language to another. dtSearch
includes a set of stemming rules designed to work with English.
These rules are in the file STEMMING.DAT. If you need to
implement stemming for a different language, or you want to
modify the English stemming rules, you can create a new set of
stemming rules to be used in place of STEMMING.DAT.
Stemming rules consist of a series of lines like this:
3+IES -> Y
4+ING ->
The first rule would convert any word with three or more letters
followed by IES to the same initial letters followed by Y.
APPLIES would turn into APPLY.
The second rule would remove the ING from any word with four or
more letters followed by ING. FISHING would turn into FISH, but
SING would not change.
In general, a rule consists of: a minimum number of letters (not
including the suffix), a + sign, a suffix to be removed, an arrow
(->) and the replacement for the suffix, if any.
When stemming a word, dtSearch will look at each rule in order
until it finds one that applies. If it finds a rule, dtSearch
will apply the rule and then start over, repeating the process
until the word does not change. The result is the "stem" of the
original word.
Sometimes you may want to create a rule with an exception. For
example, suppose you want to remove a trailing S in a word,
unless the word ends in SS. To do this, you would use these two
rules:
3+SS -> SS
3+S ->
If a word ends in SS, dtSearch will never get past the first rule
and will give up stemming the word because the rule 3+SS -> SS
does not change the word. Only words not ending in SS will get
to the next rule, which removes the trailing S.
Setting up stemming rules can be somewhat tricky. To help,
dtSearch includes the STEMTEST utility. STEMTEST will allow you
to try out your stemming rules, entering words and seeing what
the resulting stem words are.