package sequence
Sequence labeling algorithms.
- Alphabetic
- By Inheritance
- sequence
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- def crf(sequences: Array[Array[Tuple]], labels: Array[Array[Int]], ntrees: Int = 100, maxDepth: Int = 20, maxNodes: Int = 100, nodeSize: Int = 5, shrinkage: Double = 1.0): CRF
First-order linear conditional random field.
First-order linear conditional random field. A conditional random field is a type of discriminative undirected probabilistic graphical model. It is most often used for labeling or parsing of sequential data.
A CRF is a Markov random field that was trained discriminatively. Therefore it is not necessary to model the distribution over always observed variables, which makes it possible to include arbitrarily complicated features of the observed variables into the model.
References:
- J. Lafferty, A. McCallum and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML, 2001.
- Thomas G. Dietterich, Guohua Hao, and Adam Ashenfelter. Gradient Tree Boosting for Training Conditional Random Fields. JMLR, 2008.
- sequences
the observation attribute sequences.
- labels
sequence labels.
- ntrees
the number of trees/iterations.
- maxDepth
the maximum depth of the tree.
- maxNodes
the maximum number of leaf nodes in the tree.
- nodeSize
the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.
- shrinkage
the shrinkage parameter in (0, 1] controls the learning rate of procedure.
- def gcrf[T <: AnyRef](sequences: Array[Array[T]], labels: Array[Array[Int]], features: Function[T, Tuple], ntrees: Int = 100, maxDepth: Int = 20, maxNodes: Int = 100, nodeSize: Int = 5, shrinkage: Double = 1.0): CRFLabeler[T]
First-order linear conditional random field.
First-order linear conditional random field. A conditional random field is a type of discriminative undirected probabilistic graphical model. It is most often used for labeling or parsing of sequential data.
A CRF is a Markov random field that was trained discriminatively. Therefore it is not necessary to model the distribution over always observed variables, which makes it possible to include arbitrarily complicated features of the observed variables into the model.
References:
- J. Lafferty, A. McCallum and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML, 2001.
- Thomas G. Dietterich, Guohua Hao, and Adam Ashenfelter. Gradient Tree Boosting for Training Conditional Random Fields. JMLR, 2008.
- sequences
the observation attribute sequences.
- labels
sequence labels.
- ntrees
the number of trees/iterations.
- maxDepth
the maximum depth of the tree.
- maxNodes
the maximum number of leaf nodes in the tree.
- nodeSize
the number of instances in a node below which the tree will not split, setting nodeSize = 5 generally gives good results.
- shrinkage
the shrinkage parameter in (0, 1] controls the learning rate of procedure.
- def hmm[T <: AnyRef](observations: Array[Array[T]], labels: Array[Array[Int]], ordinal: ToIntFunction[T]): HMMLabeler[T]
Trains a first-order Hidden Markov Model.
Trains a first-order Hidden Markov Model.
- observations
the observation sequences, of which symbols take values in [0, n), where n is the number of unique symbols.
- labels
the state labels of observations, of which states take values in [0, p), where p is the number of hidden states.
- def hmm(observations: Array[Array[Int]], labels: Array[Array[Int]]): HMM
First-order Hidden Markov Model.
First-order Hidden Markov Model. A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be considered as the simplest dynamic Bayesian network.
In a regular Markov model, the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states.
- observations
the observation sequences, of which symbols take values in [0, n), where n is the number of unique symbols.
- labels
the state labels of observations, of which states take values in [0, p), where p is the number of hidden states.
- object $dummy
Hacking scaladoc issue-8124.
Hacking scaladoc issue-8124. The user should ignore this object.
Smile (Statistical Machine Intelligence and Learning Engine) is a fast and comprehensive machine learning, NLP, linear algebra, graph, interpolation, and visualization system in Java and Scala. With advanced data structures and algorithms, Smile delivers state-of-art performance.
Smile covers every aspect of machine learning, including classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithms, missing value imputation, efficient nearest neighbor search, etc.