Package smile.nlp.pos

Class HMMPOSTagger

java.lang.Object
smile.nlp.pos.HMMPOSTagger
All Implemented Interfaces:
Serializable, POSTagger

public class HMMPOSTagger extends Object implements POSTagger, Serializable
Part-of-speech tagging with hidden Markov model.
See Also:
  • Constructor Details

    • HMMPOSTagger

      public HMMPOSTagger()
      Constructor. Creates an empty model. For Serialization only.
  • Method Details

    • getDefault

      public static HMMPOSTagger getDefault()
      Returns the default English POS tagger.
      Returns:
      the default English POS tagger
    • tag

      public PennTreebankPOS[] tag(String[] sentence)
      Description copied from interface: POSTagger
      Tags the sentence in the form of a sequence of words.
      Specified by:
      tag in interface POSTagger
      Parameters:
      sentence - the sentence.
      Returns:
      the POS tags.
    • fit

      public static HMMPOSTagger fit(String[][] sentences, PennTreebankPOS[][] labels)
      Fits an HMM POS tagger by maximum likelihood estimation.
      Parameters:
      sentences - the training sentences.
      labels - the training labels.
      Returns:
      the model.
    • read

      public static void read(Path dir, List<String[]> sentences, List<PennTreebankPOS[]> labels)
      Load training data from a corpora.
      Parameters:
      dir - the top directory of training data.
      sentences - the output list of training sentences.
      labels - the output list of training labels.
    • walkin

      public static void walkin(File dir, List<File> files)
      Recursive function to descend into the directory tree and find all the files that end with ".POS"
      Parameters:
      dir - the top directory of training data.
      files - the output list of training files.
    • main

      public static void main(String[] args)
      Train the default model on WSJ and BROWN datasets.
      Parameters:
      args - the command line arguments.