Class CRFLabeler<T>

java.lang.Object
smile.sequence.CRFLabeler<T>
All Implemented Interfaces:
Serializable, SequenceLabeler<T>

public class CRFLabeler<T> extends Object implements SequenceLabeler<T>
First-order CRF sequence labeler.
See Also:
  • Field Details

    • model

      public final CRF model
      The CRF model.
    • features

      public final Function<T,Tuple> features
      The feature function.
  • Constructor Details

    • CRFLabeler

      public CRFLabeler(CRF model, Function<T,Tuple> features)
      Constructor.
      Parameters:
      model - the CRF model.
      features - the feature function.
  • Method Details

    • fit

      public static <T> CRFLabeler<T> fit(T[][] sequences, int[][] labels, Function<T,Tuple> features)
      Fits a CRF model.
      Type Parameters:
      T - the data type of observations.
      Parameters:
      sequences - the training data.
      labels - the training sequence labels.
      features - the feature function.
      Returns:
      the model.
    • fit

      public static <T> CRFLabeler<T> fit(T[][] sequences, int[][] labels, Function<T,Tuple> features, Properties params)
      Fits a CRF model.
      Type Parameters:
      T - the data type of observations.
      Parameters:
      sequences - the training data.
      labels - the training sequence labels.
      features - the feature function.
      params - the hyper-parameters.
      Returns:
      the model.
    • fit

      public static <T> CRFLabeler<T> fit(T[][] sequences, int[][] labels, Function<T,Tuple> features, int ntrees, int maxDepth, int maxNodes, int nodeSize, double shrinkage)
      Fits a CRF.
      Type Parameters:
      T - the data type of observations.
      Parameters:
      sequences - the observation sequences.
      labels - the state labels of observations, of which states take values in [0, k), where k is the number of hidden states.
      features - the feature function.
      ntrees - the number of trees/iterations.
      maxDepth - the maximum depth of the tree.
      maxNodes - the maximum number of leaf nodes in the tree.
      nodeSize - the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.
      shrinkage - the shrinkage parameter in (0, 1] controls the learning rate of procedure.
      Returns:
      the model.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • predict

      public int[] predict(T[] o)
      Returns the most likely label sequence given the feature sequence by the forward-backward algorithm.
      Specified by:
      predict in interface SequenceLabeler<T>
      Parameters:
      o - the observation sequence.
      Returns:
      the most likely state sequence.
    • viterbi

      public int[] viterbi(T[] o)
      Labels sequence with Viterbi algorithm. Viterbi algorithm returns the whole sequence label that has the maximum probability, which makes sense in applications (e.g.part-of-speech tagging) that require coherent sequential labeling. The forward-backward algorithm labels a sequence by individual prediction on each position. This usually produces better accuracy although the results may not be coherent.
      Parameters:
      o - the observation sequence.
      Returns:
      the sequence labels.