smile.classification.MLP

All Implemented Interfaces:: Serializable, AutoCloseable, ToDoubleFunction<double[]>, ToIntFunction<double[]>, Classifier<double[]>

public class MLP extends MultilayerPerceptron implements Classifier<double[]>, Serializable

Fully connected multilayer perceptron neural network for classification. An MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer. The nodes are interconnected through weighted acyclic arcs from each preceding layer to the following, without lateral or feedback connections. Each node calculates a transformed weighted linear combination of its inputs (output activations from the preceding layer), with one of the weights acting as a trainable bias connected to a constant input. The transformation, called activation function, is a bounded non-decreasing (non-linear) function.

The representational capabilities of an MLP are determined by the range of mappings it may implement through weight variation. Single layer perceptrons are capable of solving only linearly separable problems. With the sigmoid function as activation function, the single-layer network is identical to the logistic regression model.

The universal approximation theorem for neural networks states that every continuous function that maps intervals of real numbers to some output interval of real numbers can be approximated arbitrarily closely by a multi-layer perceptron with just one hidden layer. This result holds only for restricted classes of activation functions, which are extremely complex and NOT smooth for subtle mathematical reasons. On the other hand, smoothness is important for gradient descent learning. Besides, the proof is not constructive regarding the number of neurons required or the settings of the weights. Therefore, complex systems will have more layers of neurons with some having increased layers of input neurons and output neurons in practice.

The most popular algorithm to train MLPs is back-propagation, which is a gradient descent method. Based on chain rule, the algorithm propagates the error back through the network and adjusts the weights of each connection in order to reduce the value of the error function by some small amount. For this reason, back-propagation can only be applied on networks with differentiable activation functions.

During error back propagation, we usually times the gradient with a small number η, called learning rate, which is carefully selected to ensure that the network converges to a local minimum of the error function fast enough, without producing oscillations. One way to avoid oscillation at large η, is to make the change in weight dependent on the past weight change by adding a momentum term.

Although the back-propagation algorithm may perform gradient descent on the total error of all instances in a batch way, the learning rule is often applied to each instance separately in an online way or stochastic way. There exists empirical indication that the stochastic way results in faster convergence.

In practice, the problem of over-fitting has emerged. This arises in convoluted or over-specified systems when the capacity of the network significantly exceeds the needed free parameters. There are two general approaches for avoiding this problem: The first is to use cross-validation and similar techniques to check for the presence of over-fitting and optimally select hyperparameters such as to minimize the generalization error. The second is to use some form of regularization, which emerges naturally in a Bayesian framework, where the regularization can be performed by selecting a larger prior probability over simpler models; but also in statistical learning theory, where the goal is to minimize over the "empirical risk" and the "structural risk".

For neural networks, the input patterns usually should be scaled/standardized. Commonly, each input variable is scaled into interval [0, 1] or to have mean 0 and standard deviation 1.

For penalty functions and output units, the following natural pairings are recommended:

linear output units and a least squares penalty function.
a two-class cross-entropy penalty function and a logistic activation function.
a multi-class cross-entropy penalty function and a softmax activation function.

By assigning a softmax activation function on the output layer of the neural network for categorical target variables, the outputs can be interpreted as posterior probabilities, which are very useful.

See Also:

Nested Class Summary

Nested classes/interfaces inherited from interface smile.classification.Classifier
Classifier.Trainer<T,M extends Classifier<T>>
Field Summary

Fields inherited from class smile.base.mlp.MultilayerPerceptron
clipNorm, clipValue, epsilon, lambda, learningRate, momentum, net, output, p, rho, t, target
Constructor Summary

Constructors

Constructor

Description

MLP(LayerBuilder... builders)

Constructor.

MLP(IntSet classes, LayerBuilder... builders)

Constructor.
Method Summary

Modifier and Type

Method

Description

int[]

classes()

Returns the class labels.

static MLP

fit(double[][] x, int[] y, Properties params)

Fits a MLP model.

int

numClasses()

Returns the number of classes.

boolean

online()

Returns true if this is an online learner.

int

predict(double[] x)

Predicts the class label of an instance.

int

predict(double[] x, double[] posteriori)

Predicts the class label of an instance and also calculate a posteriori probabilities.

boolean

soft()

Returns true if this is a soft classifier that can estimate the posteriori probabilities of classification.

void

update(double[][] x, int[] y)

Updates the model with a mini-batch.

void

update(double[] x, int y)

Updates the model with a single sample.

Methods inherited from class smile.base.mlp.MultilayerPerceptron
backpropagate, close, getClipNorm, getClipValue, getLearningRate, getMomentum, getWeightDecay, propagate, setClipNorm, setClipValue, setLearningRate, setMomentum, setParameters, setRMSProp, setWeightDecay, toString, update

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface smile.classification.Classifier
applyAsDouble, applyAsInt, predict, predict, predict, predict, predict, predict, score, update

Constructor Details
- MLP
  
  public MLP(LayerBuilder... builders)
  
  Constructor.
  
  Parameters:
  
  builders - the builders of layers from bottom to top.
- MLP
  
  public MLP(IntSet classes, LayerBuilder... builders)
  
  Constructor.
  
  Parameters:
  
  classes - the class labels.
  
  builders - the builders of layers from bottom to top.
Method Details
- numClasses
  
  public int numClasses()
  
  Description copied from interface: Classifier
  
  Returns the number of classes.
  
  Specified by:
  
  numClasses in interface Classifier<double[]>
  
  Returns:
  
  the number of classes.
- classes
  
  public int[] classes()
  
  Description copied from interface: Classifier
  
  Returns the class labels.
  
  Specified by:
  
  classes in interface Classifier<double[]>
  
  Returns:
  
  the class labels.
- predict
  
  public int predict(double[] x, double[] posteriori)
  
  Description copied from interface: Classifier
  
  Predicts the class label of an instance and also calculate a posteriori probabilities. Classifiers may NOT support this method since not all classification algorithms are able to calculate such a posteriori probabilities.
  
  Specified by:
  
  predict in interface Classifier<double[]>
  
  Parameters:
  
  x - an instance to be classified.
  
  posteriori - a posteriori probabilities on output.
  
  Returns:
  
  the predicted class label
- predict
  
  public int predict(double[] x)
  
  Description copied from interface: Classifier
  
  Predicts the class label of an instance.
  
  Specified by:
  
  predict in interface Classifier<double[]>
  
  Parameters:
  
  x - the instance to be classified.
  
  Returns:
  
  the predicted class label.
- soft
  
  public boolean soft()
  
  Description copied from interface: Classifier
  
  Returns true if this is a soft classifier that can estimate the posteriori probabilities of classification.
  
  Specified by:
  
  soft in interface Classifier<double[]>
  
  Returns:
  
  true if soft classifier.
- online
  
  public boolean online()
  
  Description copied from interface: Classifier
  
  Returns true if this is an online learner.
  
  Specified by:
  
  online in interface Classifier<double[]>
  
  Returns:
  
  true if online learner.
- update
  
  public void update(double[] x, int y)
  
  Updates the model with a single sample. RMSProp is not applied.
  
  Specified by:
  
  update in interface Classifier<double[]>
  
  Parameters:
  
  x - the training instance.
  
  y - the training label.
- update
  
  public void update(double[][] x, int[] y)
  
  Updates the model with a mini-batch. RMSProp is applied if rho > 0.
  
  Specified by:
  
  update in interface Classifier<double[]>
  
  Parameters:
  
  x - the training instances.
  
  y - the training labels.
- fit
  
  public static MLP fit(double[][] x, int[] y, Properties params)
  
  Fits a MLP model.
  
  Parameters:
  
  x - the training dataset.
  
  y - the training labels.
  
  params - the hyperparameters.
  
  Returns:
  
  the model.

Class MLP

Nested Class Summary

Nested classes/interfaces inherited from interface smile.classification.Classifier

Field Summary

Fields inherited from class smile.base.mlp.MultilayerPerceptron

Constructor Summary

Method Summary

Methods inherited from class smile.base.mlp.MultilayerPerceptron

Methods inherited from class java.lang.Object

Methods inherited from interface smile.classification.Classifier

Constructor Details

MLP

MLP

Method Details

numClasses

classes

predict

predict

soft

online

update

update

fit