Package smile.nlp.stemmer
Class PorterStemmer
java.lang.Object
smile.nlp.stemmer.PorterStemmer
Porter's stemming algorithm. The stemmer is based on the idea that the
suffixes in the English language are mostly made up of a combination of
smaller and simpler suffixes. This is a linear step stemmer.
Specifically it has five steps applying rules within each step. Within
each step, if a suffix rule matched to a word, then the conditions
attached to that rule are tested on what would be the resulting stem,
if that suffix was removed, in the way defined by the rule. Once a Rule
passes its conditions and is accepted the rule fires and the suffix is
removed and control moves to the next step. If the rule is not accepted
then the next rule in the step is tested, until either a rule from that
step fires and control passes to the next step or there are no more rules
in that step whence control moves to the next step.
Note that this class is NOT multi-thread safe.
The code is based on the C code.
References
- Martin Porter, An algorithm for suffix stripping, Program, 14(3), 130-137, 1980.
-
Constructor Details
-
PorterStemmer
public PorterStemmer()Constructor.
-
-
Method Details
-
stem
Description copied from interface:Stemmer
Transforms a word into its root form. -
stripPluralParticiple
Removes plurals and participles.- Parameters:
word
- the word.- Returns:
- the word without plurals and participles.
-