Class LDA

java.lang.Object
smile.classification.AbstractClassifier<double[]>
smile.classification.LDA
All Implemented Interfaces:
Serializable, ToDoubleFunction<double[]>, ToIntFunction<double[]>, Classifier<double[]>

public class LDA extends AbstractClassifier<double[]>
Linear discriminant analysis. LDA is based on the Bayes decision theory and assumes that the conditional probability density functions are normally distributed. LDA also makes the simplifying homoscedastic assumption (i.e. that the class covariances are identical) and that the covariances have full rank. With these assumptions, the discriminant function of an input being in a class is purely a function of this linear combination of independent variables.

LDA is closely related to ANOVA (analysis of variance) and linear regression analysis, which also attempt to express one dependent variable as a linear combination of other features or measurements. In the other two methods, however, the dependent variable is a numerical quantity, while for LDA it is a categorical variable (i.e. the class label). Logistic regression and probit regression are more similar to LDA, as they also explain a categorical variable. These other methods are preferable in applications where it is not reasonable to assume that the independent variables are normally distributed, which is a fundamental assumption of the LDA method.

One complication in applying LDA (and Fisher's discriminant) to real data occurs when the number of variables/features does not exceed the number of samples. In this case, the covariance estimates do not have full rank, and so cannot be inverted. This is known as small sample size problem.

See Also:
  • Constructor Details

    • LDA

      public LDA(double[] priori, double[][] mu, double[] eigen, Matrix scaling)
      Constructor.
      Parameters:
      priori - a priori probabilities of each class.
      mu - the mean vectors of each class.
      eigen - the eigen values of common variance matrix.
      scaling - the eigen vectors of common covariance matrix.
    • LDA

      public LDA(double[] priori, double[][] mu, double[] eigen, Matrix scaling, IntSet labels)
      Constructor.
      Parameters:
      priori - a priori probabilities of each class.
      mu - the mean vectors of each class.
      eigen - the eigen values of common variance matrix.
      scaling - the eigen vectors of common covariance matrix.
      labels - the class label encoder.
  • Method Details

    • fit

      public static LDA fit(double[][] x, int[] y)
      Fits linear discriminant analysis.
      Parameters:
      x - training samples.
      y - training labels in [0, k), where k is the number of classes.
      Returns:
      the model.
    • fit

      public static LDA fit(double[][] x, int[] y, Properties params)
      Fits linear discriminant analysis.
      Parameters:
      x - training samples.
      y - training labels.
      params - the hyper-parameters.
      Returns:
      the model.
    • fit

      public static LDA fit(double[][] x, int[] y, double[] priori, double tol)
      Fits linear discriminant analysis.
      Parameters:
      x - training samples.
      y - training labels.
      priori - the priori probability of each class. If null, it will be estimated from the training data.
      tol - a tolerance to decide if a covariance matrix is singular; it will reject variables whose variance is less than tol2.
      Returns:
      the model.
    • priori

      public double[] priori()
      Returns a priori probabilities.
      Returns:
      a priori probabilities.
    • predict

      public int predict(double[] x)
      Description copied from interface: Classifier
      Predicts the class label of an instance.
      Parameters:
      x - the instance to be classified.
      Returns:
      the predicted class label.
    • soft

      public boolean soft()
      Description copied from interface: Classifier
      Returns true if this is a soft classifier that can estimate the posteriori probabilities of classification.
      Returns:
      true if soft classifier.
    • predict

      public int predict(double[] x, double[] posteriori)
      Description copied from interface: Classifier
      Predicts the class label of an instance and also calculate a posteriori probabilities. Classifiers may NOT support this method since not all classification algorithms are able to calculate such a posteriori probabilities.
      Parameters:
      x - an instance to be classified.
      posteriori - a posteriori probabilities on output.
      Returns:
      the predicted class label