Class PorterStemmer

java.lang.Object
smile.nlp.stemmer.PorterStemmer
All Implemented Interfaces:
Function<String,String>, Stemmer

public class PorterStemmer extends Object implements Stemmer
Porter's stemming algorithm. The stemmer is based on the idea that the suffixes in the English language are mostly made up of a combination of smaller and simpler suffixes. This is a linear step stemmer. Specifically it has five steps applying rules within each step. Within each step, if a suffix rule matched to a word, then the conditions attached to that rule are tested on what would be the resulting stem, if that suffix was removed, in the way defined by the rule. Once a Rule passes its conditions and is accepted the rule fires and the suffix is removed and control moves to the next step. If the rule is not accepted then the next rule in the step is tested, until either a rule from that step fires and control passes to the next step or there are no more rules in that step whence control moves to the next step.

Note that this class is NOT multi-thread safe.

The code is based on the C code.

References

  1. Martin Porter, An algorithm for suffix stripping, Program, 14(3), 130-137, 1980.
  • Constructor Details

    • PorterStemmer

      public PorterStemmer()
      Constructor.
  • Method Details

    • stem

      public String stem(String word)
      Description copied from interface: Stemmer
      Transforms a word into its root form.
      Specified by:
      stem in interface Stemmer
      Parameters:
      word - the word.
      Returns:
      the stem.
    • stripPluralParticiple

      public String stripPluralParticiple(String word)
      Removes plurals and participles.
      Parameters:
      word - the word.
      Returns:
      the word without plurals and participles.