smile.nlp.stemmer.LancasterStemmer

All Implemented Interfaces:: Function<String,String>, Stemmer

public class LancasterStemmer extends Object implements Stemmer

The Paice/Husk Lancaster stemming algorithm. The stemmer is a conflation based iterative stemmer. The stemmer, although remaining efficient and easily implemented, is known to be very strong and aggressive. The stemmer utilizes a single table of rules, each of which may specify the removal or replacement of an ending. For details, see

References

Paice, Another stemmer, SIGIR Forum, 24(3), 56-61, 1990.

Constructor Summary

Constructors

Constructor

Description

LancasterStemmer()

Constructor with default rules.

LancasterStemmer(boolean stripPrefix)

Constructor with default rules.

LancasterStemmer(InputStream customizedRules)

Constructor with customized rules.

LancasterStemmer(InputStream customizedRules, boolean stripPrefix)

Constructor with customized rules.
Method Summary

Modifier and Type

Method

Description

String

stem(String word)

Transforms a word into its root form.

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface Function
andThen, compose

Methods inherited from interface Stemmer
apply

Constructor Details
- LancasterStemmer
  
  public LancasterStemmer()
  
  Constructor with default rules. By default, the stemmer will not strip prefix from words.
- LancasterStemmer
  
  public LancasterStemmer(boolean stripPrefix)
  
  Constructor with default rules.
  
  Parameters:
  
  stripPrefix - true if the stemmer will strip prefix such as kilo, micro, milli, intra, ultra, mega, nano, pico, pseudo.
- LancasterStemmer
  
  public LancasterStemmer(InputStream customizedRules) throws IOException
  
  Constructor with customized rules. By default, the stemmer will not strip prefix from words.
  
  Parameters:
  
  customizedRules - an input stream to read customized rules.
  
  Throws:
  
  IOException - when fails to read the rule file.
- LancasterStemmer
  
  public LancasterStemmer(InputStream customizedRules, boolean stripPrefix) throws IOException
  
  Constructor with customized rules.
  
  Parameters:
  
  customizedRules - an input stream to read customized rules.
  
  stripPrefix - true if the stemmer will strip prefix such as kilo, micro, milli, intra, ultra, mega, nano, pico, pseudo.
  
  Throws:
  
  IOException - when fails to read the rule file.
Method Details
- stem
  
  public String stem(String word)
  
  Description copied from interface: Stemmer
  
  Transforms a word into its root form.
  
  Specified by:
  
  stem in interface Stemmer
  
  Parameters:
  
  word - the word.
  
  Returns:
  
  the stem.

Class LancasterStemmer

References

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface Function

Methods inherited from interface Stemmer

Constructor Details

LancasterStemmer

LancasterStemmer

LancasterStemmer

LancasterStemmer

Method Details

stem