Package smile.nlp.stemmer
Class LancasterStemmer
java.lang.Object
smile.nlp.stemmer.LancasterStemmer
- All Implemented Interfaces:
Stemmer
The Paice/Husk Lancaster stemming algorithm. The stemmer is a conflation
based iterative stemmer. The stemmer, although remaining efficient and
easily implemented, is known to be very strong and aggressive. The stemmer
utilizes a single table of rules, each of which may specify
the removal or replacement of an ending. For details, see
References
- Paice, Another stemmer, SIGIR Forum, 24(3), 56-61, 1990.
- http://www.comp.lancs.ac.uk/computing/research/stemming/Links/paice.htm
-
Constructor Summary
ConstructorDescriptionConstructor with default rules.LancasterStemmer
(boolean stripPrefix) Constructor with default rules.LancasterStemmer
(InputStream customizedRules) Constructor with customized rules.LancasterStemmer
(InputStream customizedRules, boolean stripPrefix) Constructor with customized rules. -
Method Summary
-
Constructor Details
-
LancasterStemmer
public LancasterStemmer()Constructor with default rules. By default, the stemmer will not strip prefix from words. -
LancasterStemmer
public LancasterStemmer(boolean stripPrefix) Constructor with default rules.- Parameters:
stripPrefix
- true if the stemmer will strip prefix such as kilo, micro, milli, intra, ultra, mega, nano, pico, pseudo.
-
LancasterStemmer
Constructor with customized rules. By default, the stemmer will not strip prefix from words.- Parameters:
customizedRules
- an input stream to read customized rules.- Throws:
IOException
- when fails to read the rule file.
-
LancasterStemmer
Constructor with customized rules.- Parameters:
customizedRules
- an input stream to read customized rules.stripPrefix
- true if the stemmer will strip prefix such as kilo, micro, milli, intra, ultra, mega, nano, pico, pseudo.- Throws:
IOException
- when fails to read the rule file.
-
-
Method Details