smile.nlp.stemmer.PorterStemmer

All Implemented Interfaces:: Function<String,String>, Stemmer

public class PorterStemmer extends Object implements Stemmer

Porter's stemming algorithm. The stemmer is based on the idea that the suffixes in the English language are mostly made up of a combination of smaller and simpler suffixes. This is a linear step stemmer. Specifically it has five steps applying rules within each step. Within each step, if a suffix rule matched to a word, then the conditions attached to that rule are tested on what would be the resulting stem, if that suffix was removed, in the way defined by the rule. Once a Rule passes its conditions and is accepted the rule fires and the suffix is removed and control moves to the next step. If the rule is not accepted then the next rule in the step is tested, until either a rule from that step fires and control passes to the next step or there are no more rules in that step whence control moves to the next step.

Note that this class is NOT multi-thread safe.

The code is based on the C code.

References

Martin Porter, An algorithm for suffix stripping, Program, 14(3), 130-137, 1980.

Constructor Summary

Constructors

Constructor

Description

PorterStemmer()

Constructor.
Method Summary

Modifier and Type

Method

Description

String

stem(String word)

Transforms a word into its root form.

String

stripPluralParticiple(String word)

Removes plurals and participles.

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface Function
andThen, compose

Methods inherited from interface Stemmer
apply

Constructor Details
- PorterStemmer
  
  public PorterStemmer()
  
  Constructor.
Method Details
- stem
  
  public String stem(String word)
  
  Description copied from interface: Stemmer
  
  Transforms a word into its root form.
  
  Specified by:
  
  stem in interface Stemmer
  
  Parameters:
  
  word - the word.
  
  Returns:
  
  the stem.
- stripPluralParticiple
  
  public String stripPluralParticiple(String word)
  
  Removes plurals and participles.
  
  Parameters:
  
  word - the word.
  
  Returns:
  
  the word without plurals and participles.

Class PorterStemmer

References

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface Function

Methods inherited from interface Stemmer

Constructor Details

PorterStemmer

Method Details

stem

stripPluralParticiple