public class DiscreteNaiveBayes extends java.lang.Object implements OnlineClassifier<int[]>, SoftClassifier<int[]>
In spite of their naive design and apparently oversimplified assumptions, naive Bayes classifiers have worked quite well in many complex realworld situations and are very popular in Natural Language Processing (NLP).
For document classification in NLP, there are two major different ways we can set up an naive Bayes classifier: multinomial model and Bernoulli model. The multinomial model generates one term from the vocabulary in each position of the document. The multivariate Bernoulli model or Bernoulli model generates an indicator for each term of the vocabulary, either indicating presence of the term in the document or indicating absence. Of the two models, the Bernoulli model is particularly sensitive to noise features. A Bernoulli naive Bayes classifier requires some form of feature selection or else its accuracy will be low.
The different generation models imply different estimation strategies and different classification rules. The Bernoulli model estimates as the fraction of documents of class that contain term. In contrast, the multinomial model estimates as the fraction of tokens or fraction of positions in documents of class that contain term. When classifying a test document, the Bernoulli model uses binary occurrence information, ignoring the number of occurrences, whereas the multinomial model keeps track of multiple occurrences. As a result, the Bernoulli model typically makes many mistakes when classifying long documents. However, it was reported that the Bernoulli model works better in sentiment analysis.
The models also differ in how nonoccurring terms are used in classification. They do not affect the classification decision in the multinomial model; but in the Bernoulli model the probability of nonoccurrence is factored in when computing. This is because only the Bernoulli model models absence of terms explicitly.
A third setting is Polya Urn model which simply add twice for what is seen in training data instead of one time. See reference for more details.
Distribution
,
LDA
,
QDA
,
RDA
,
Serialized FormModifier and Type  Class and Description 

static class 
DiscreteNaiveBayes.Model
The generation models of naive Bayes classifier.

Constructor and Description 

DiscreteNaiveBayes(DiscreteNaiveBayes.Model model,
double[] priori,
int p)
Constructor of naive Bayes classifier for document classification.

DiscreteNaiveBayes(DiscreteNaiveBayes.Model model,
double[] priori,
int p,
double sigma,
IntSet labels)
Constructor of naive Bayes classifier for document classification.

DiscreteNaiveBayes(DiscreteNaiveBayes.Model model,
int k,
int p)
Constructor of naive Bayes classifier for document classification.

DiscreteNaiveBayes(DiscreteNaiveBayes.Model model,
int k,
int p,
double sigma,
IntSet labels)
Constructor of naive Bayes classifier for document classification.

Modifier and Type  Method and Description 

int 
predict(int[] x)
Predict the class of an instance.

int 
predict(int[] x,
double[] posteriori)
Predict the class of an instance.

int 
predict(SparseArray x)
Predict the class of an instance.

int 
predict(SparseArray x,
double[] posteriori)
Predict the class of an instance.

double[] 
priori()
Returns a priori probabilities.

void 
update(int[][] x,
int[] y)
Batch learning of naive Bayes classifier on sequences,
which are modeled as a bag of words.

void 
update(int[] x,
int y)
Online learning of naive Bayes classifier on a sequence,
which is modeled as a bag of words.

void 
update(SparseArray[] x,
int[] y)
Batch learning of naive Bayes classifier on sequences,
which are modeled as a bag of words.

void 
update(SparseArray x,
int y)
Online learning of naive Bayes classifier on a sequence,
which is modeled as a bag of words.

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
applyAsDouble, applyAsInt, f, predict
public DiscreteNaiveBayes(DiscreteNaiveBayes.Model model, int k, int p)
model
 the generation model of naive Bayes classifier.k
 the number of classes.p
 the dimensionality of input space.public DiscreteNaiveBayes(DiscreteNaiveBayes.Model model, int k, int p, double sigma, IntSet labels)
model
 the generation model of naive Bayes classifier.k
 the number of classes.p
 the dimensionality of input space.sigma
 the prior count of addk smoothing of evidence.labels
 class labelspublic DiscreteNaiveBayes(DiscreteNaiveBayes.Model model, double[] priori, int p)
model
 the generation model of naive Bayes classifier.priori
 the priori probability of each class.p
 the dimensionality of input space.public DiscreteNaiveBayes(DiscreteNaiveBayes.Model model, double[] priori, int p, double sigma, IntSet labels)
model
 the generation model of naive Bayes classifier.priori
 the priori probability of each class.p
 the dimensionality of input space.sigma
 the prior count of addk smoothing of evidence.public double[] priori()
public void update(int[] x, int y)
update
in interface OnlineClassifier<int[]>
x
 training instance.y
 training label.public void update(SparseArray x, int y)
x
 training instance in sparse format.y
 training label.public void update(int[][] x, int[] y)
update
in interface OnlineClassifier<int[]>
x
 training instances.y
 training labels.public void update(SparseArray[] x, int[] y)
x
 training instances.y
 training labels.public int predict(int[] x)
predict
in interface Classifier<int[]>
x
 the instance to be classified.public int predict(int[] x, double[] posteriori)
predict
in interface SoftClassifier<int[]>
x
 the instance to be classified.posteriori
 the array to store a posteriori probabilities on output.public int predict(SparseArray x)
x
 the instance to be classified.public int predict(SparseArray x, double[] posteriori)
x
 the instance to be classified.posteriori
 the array to store a posteriori probabilities on output.