smile

sequence

package sequence

Sequence labeling algorithms.

Linear Supertypes
AnyRef, Any
Ordering
1. Alphabetic
2. By Inheritance
Inherited
1. sequence
2. AnyRef
3. Any
1. Hide All
2. Show All
Visibility
1. Public
2. All

Value Members

1. object \$dummy

Hacking scaladoc issue-8124. The user should ignore this object.

2. def crf(sequences: Array[Array[Tuple]], labels: Array[Array[Int]], ntrees: Int = 100, maxDepth: Int = 20, maxNodes: Int = 100, nodeSize: Int = 5, shrinkage: Double = 1.0): CRF

First-order linear conditional random field.

First-order linear conditional random field. A conditional random field is a type of discriminative undirected probabilistic graphical model. It is most often used for labeling or parsing of sequential data.

A CRF is a Markov random field that was trained discriminatively. Therefore it is not necessary to model the distribution over always observed variables, which makes it possible to include arbitrarily complicated features of the observed variables into the model.

References:
• J. Lafferty, A. McCallum and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML, 2001.
• Thomas G. Dietterich, Guohua Hao, and Adam Ashenfelter. Gradient Tree Boosting for Training Conditional Random Fields. JMLR, 2008.
sequences

the observation attribute sequences.

labels

sequence labels.

ntrees

the number of trees/iterations.

maxDepth

the maximum depth of the tree.

maxNodes

the maximum number of leaf nodes in the tree.

nodeSize

the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.

shrinkage

the shrinkage parameter in (0, 1] controls the learning rate of procedure.

3. def gcrf[T <: AnyRef](sequences: Array[Array[T]], labels: Array[Array[Int]], features: Function[T, Tuple], ntrees: Int = 100, maxDepth: Int = 20, maxNodes: Int = 100, nodeSize: Int = 5, shrinkage: Double = 1.0): CRFLabeler[T]

First-order linear conditional random field.

First-order linear conditional random field. A conditional random field is a type of discriminative undirected probabilistic graphical model. It is most often used for labeling or parsing of sequential data.

A CRF is a Markov random field that was trained discriminatively. Therefore it is not necessary to model the distribution over always observed variables, which makes it possible to include arbitrarily complicated features of the observed variables into the model.

References:
• J. Lafferty, A. McCallum and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML, 2001.
• Thomas G. Dietterich, Guohua Hao, and Adam Ashenfelter. Gradient Tree Boosting for Training Conditional Random Fields. JMLR, 2008.
sequences

the observation attribute sequences.

labels

sequence labels.

ntrees

the number of trees/iterations.

maxDepth

the maximum depth of the tree.

maxNodes

the maximum number of leaf nodes in the tree.

nodeSize

the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.

shrinkage

the shrinkage parameter in (0, 1] controls the learning rate of procedure.

4. def hmm[T <: AnyRef](observations: Array[Array[T]], labels: Array[Array[Int]], ordinal: ToIntFunction[T]): HMMLabeler[T]

Trains a first-order Hidden Markov Model.

Trains a first-order Hidden Markov Model.

observations

the observation sequences, of which symbols take values in [0, n), where n is the number of unique symbols.

labels

the state labels of observations, of which states take values in [0, p), where p is the number of hidden states.

5. def hmm(observations: Array[Array[Int]], labels: Array[Array[Int]]): HMM

First-order Hidden Markov Model.

First-order Hidden Markov Model. A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be considered as the simplest dynamic Bayesian network.

In a regular Markov model, the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states.

observations

the observation sequences, of which symbols take values in [0, n), where n is the number of unique symbols.

labels

the state labels of observations, of which states take values in [0, p), where p is the number of hidden states.