Class NaiveBayes

java.lang.Object
smile.classification.AbstractClassifier<double[]>
smile.classification.NaiveBayes
All Implemented Interfaces:
Serializable, ToDoubleFunction<double[]>, ToIntFunction<double[]>, Classifier<double[]>

public class NaiveBayes extends AbstractClassifier<double[]>
Naive Bayes classifier. A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions. Depending on the precise nature of the probability model, naive Bayes classifiers can be trained very efficiently in a supervised learning setting.

For a general purpose naive Bayes classifier without any assumptions about the underlying distribution of each variable, we don't provide a learning method to infer the variable distributions from the training data. Instead, the users can fit any appropriate distributions on the data by themselves with various Distribution classes. Although the predict(double[]) method takes an array of double values as a general form of independent variables, the users are free to use any discrete distributions to model categorical or ordinal random variables.

References

  1. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze. Introduction to Information Retrieval, Chapter 13, 2009.
  2. Kevin P. Murphy. Machina Learning A Probability Perspective, Chapter 3, 2012.
See Also:
  • Constructor Details

    • NaiveBayes

      public NaiveBayes(double[] priori, Distribution[][] condprob)
      Constructor of general naive Bayes classifier.
      Parameters:
      priori - the priori probability of each class.
      condprob - the conditional distribution of each variable in each class. In particular, condprob[i][j] is the conditional distribution P(xj | class i).
    • NaiveBayes

      public NaiveBayes(double[] priori, Distribution[][] condprob, IntSet labels)
      Constructor of general naive Bayes classifier.
      Parameters:
      priori - the priori probability of each class.
      condprob - the conditional distribution of each variable in each class. In particular, condprob[i][j] is the conditional distribution P(xj | class i).
      labels - the class label encoder.
  • Method Details

    • priori

      public double[] priori()
      Returns a priori probabilities.
      Returns:
      a priori probabilities.
    • predict

      public int predict(double[] x)
      Predict the class of an instance.
      Parameters:
      x - the instance to be classified.
      Returns:
      the predicted class label.
    • soft

      public boolean soft()
      Description copied from interface: Classifier
      Returns true if this is a soft classifier that can estimate the posteriori probabilities of classification.
      Returns:
      true if soft classifier.
    • predict

      public int predict(double[] x, double[] posteriori)
      Predict the class of an instance.
      Parameters:
      x - the instance to be classified.
      posteriori - the array to store a posteriori probabilities on output.
      Returns:
      the predicted class label.