public class NeuralNetwork extends java.lang.Object implements OnlineClassifier<double[]>, SoftClassifier<double[]>, java.io.Serializable
The representational capabilities of a MLP are determined by the range of mappings it may implement through weight variation. Single layer perceptrons are capable of solving only linearly separable problems. With the sigmoid function as activation function, the singlelayer network is identical to the logistic regression model.
The universal approximation theorem for neural networks states that every continuous function that maps intervals of real numbers to some output interval of real numbers can be approximated arbitrarily closely by a multilayer perceptron with just one hidden layer. This result holds only for restricted classes of activation functions, which are extremely complex and NOT smooth for subtle mathematical reasons. On the other hand, smoothness is important for gradient descent learning. Besides, the proof is not constructive regarding the number of neurons required or the settings of the weights. Therefore, complex systems will have more layers of neurons with some having increased layers of input neurons and output neurons in practice.
The most popular algorithm to train MLPs is backpropagation, which is a gradient descent method. Based on chain rule, the algorithm propagates the error back through the network and adjusts the weights of each connection in order to reduce the value of the error function by some small amount. For this reason, backpropagation can only be applied on networks with differentiable activation functions.
During error back propagation, we usually times the gradient with a small number η, called learning rate, which is carefully selected to ensure that the network converges to a local minimum of the error function fast enough, without producing oscillations. One way to avoid oscillation at large η, is to make the change in weight dependent on the past weight change by adding a momentum term.
Although the backpropagation algorithm may performs gradient descent on the total error of all instances in a batch way, the learning rule is often applied to each instance separately in an online way or stochastic way. There exists empirical indication that the stochastic way results in faster convergence.
In practice, the problem of overfitting has emerged. This arises in convoluted or overspecified systems when the capacity of the network significantly exceeds the needed free parameters. There are two general approaches for avoiding this problem: The first is to use crossvalidation and similar techniques to check for the presence of overfitting and optimally select hyperparameters such as to minimize the generalization error. The second is to use some form of regularization, which emerges naturally in a Bayesian framework, where the regularization can be performed by selecting a larger prior probability over simpler models; but also in statistical learning theory, where the goal is to minimize over the "empirical risk" and the "structural risk".
For neural networks, the input patterns usually should be scaled/standardized. Commonly, each input variable is scaled into interval [0, 1] or to have mean 0 and standard deviation 1.
For penalty functions and output units, the following natural pairings are recommended:
Modifier and Type  Class and Description 

static class 
NeuralNetwork.ActivationFunction
The types of activation functions in output layer.

static class 
NeuralNetwork.ErrorFunction
The types of error functions.

static class 
NeuralNetwork.Trainer
Trainer for neural networks.

Constructor and Description 

NeuralNetwork(NeuralNetwork.ErrorFunction error,
int... numUnits)
Constructor.

NeuralNetwork(NeuralNetwork.ErrorFunction error,
NeuralNetwork.ActivationFunction activation,
int... numUnits)
Constructor.

Modifier and Type  Method and Description 

NeuralNetwork 
clone() 
double 
getLearningRate()
Returns the learning rate.

double 
getMomentum()
Returns the momentum factor.

double[][] 
getWeight(int layer)
Returns the weights of a layer.

double 
getWeightDecay()
Returns the weight decay factor.

void 
learn(double[][] x,
int[] y)
Trains the neural network with the given dataset for one epoch by
stochastic gradient descent.

double 
learn(double[] x,
double[] y,
double weight)
Update the neural network with given instance and associated target value.

void 
learn(double[] x,
int y)
Online update the classifier with a new training instance.

void 
learn(double[] x,
int y,
double weight)
Online update the neural network with a new training instance.

int 
predict(double[] x)
Predict the class of a given instance.

int 
predict(double[] x,
double[] y)
Predict the target value of a given instance.

void 
setLearningRate(double eta)
Sets the learning rate.

void 
setMomentum(double alpha)
Sets the momentum factor.

void 
setWeightDecay(double lambda)
Sets the weight decay factor.

equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
predict
public NeuralNetwork(NeuralNetwork.ErrorFunction error, int... numUnits)
error
 the error function.numUnits
 the number of units in each layer.public NeuralNetwork(NeuralNetwork.ErrorFunction error, NeuralNetwork.ActivationFunction activation, int... numUnits)
error
 the error function.activation
 the activation function of output layer.numUnits
 the number of units in each layer.public NeuralNetwork clone()
clone
in class java.lang.Object
public void setLearningRate(double eta)
eta
 the learning rate.public double getLearningRate()
public void setMomentum(double alpha)
alpha
 the momentum factor.public double getMomentum()
public void setWeightDecay(double lambda)
lambda
 the weight decay for regularization.public double getWeightDecay()
public double[][] getWeight(int layer)
layer
 the layer of netural network, 0 for input layer.public int predict(double[] x, double[] y)
predict
in interface SoftClassifier<double[]>
x
 the instance.y
 the array to store network output on output. For softmax
activation function, these are estimated posteriori probabilities.public int predict(double[] x)
predict
in interface Classifier<double[]>
x
 the instance.public double learn(double[] x, double[] y, double weight)
x
 the training instance.y
 the target value.weight
 a positive weight value associated with the training instance.public void learn(double[] x, int y)
OnlineClassifier
learn
in interface OnlineClassifier<double[]>
x
 training instance.y
 training label.public void learn(double[] x, int y, double weight)
x
 training instance.y
 training label.weight
 a positive weight value associated with the training instance.public void learn(double[][] x, int[] y)
x
 training instances.y
 training labels in [0, k), where k is the number of classes.