Class BreakIteratorTokenizer

java.lang.Object
smile.nlp.tokenizer.BreakIteratorTokenizer
All Implemented Interfaces:
Function<String,String[]>, Tokenizer

public class BreakIteratorTokenizer extends Object implements Tokenizer
A word tokenizer based on the java.text.BreakIterator, which supports multiple natural languages (selected by locale setting).
  • Constructor Details

    • BreakIteratorTokenizer

      public BreakIteratorTokenizer()
      Constructor for the default locale.
    • BreakIteratorTokenizer

      public BreakIteratorTokenizer(Locale locale)
      Constructor for the given locale.
      Parameters:
      locale - the locale.
  • Method Details

    • split

      public String[] split(String text)
      Description copied from interface: Tokenizer
      Splits the string into a list of tokens.
      Specified by:
      split in interface Tokenizer
      Parameters:
      text - the text.
      Returns:
      the tokens.