smile.classification.AbstractClassifier<T>

smile.classification.KNN<T>

Type Parameters:: T - the data type of model input objects.

All Implemented Interfaces:: Serializable, ToDoubleFunction<T>, ToIntFunction<T>, Classifier<T>

public class KNN<T> extends AbstractClassifier<T>

K-nearest neighbor classifier. The k-nearest neighbor algorithm (k-NN) is a method for classifying objects by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors (k is a positive integer, typically small). k-NN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is deferred until classification.

The best choice of k depends upon the data; generally, larger values of k reduce the effect of noise on the classification, but make boundaries between classes less distinct. A good k can be selected by various heuristic techniques, e.g. cross-validation. In binary problems, it is helpful to choose k to be an odd number as this avoids tied votes.

A drawback to the basic majority voting classification is that the classes with the more frequent instances tend to dominate the prediction of the new object, as they tend to come up in the k nearest neighbors when the neighbors are computed due to their large number. One way to overcome this problem is to weight the classification taking into account the distance from the test point to each of its k nearest neighbors.

Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as Large Margin Nearest Neighbor or Neighborhood Components Analysis.

Nearest neighbor rules in effect compute the decision boundary in an implicit manner. It is also possible to compute the decision boundary itself explicitly, and to do so in an efficient manner so that the computational complexity is a function of the boundary complexity.

The nearest neighbor algorithm has some strong consistency results. As the amount of data approaches infinity, the algorithm is guaranteed to yield an error rate no worse than twice the Bayes error rate (the minimum achievable error rate given the distribution of the data). k-NN is guaranteed to approach the Bayes error rate, for some value of k (where k increases as a function of the number of data points).

See Also:

Nested Class Summary

Nested classes/interfaces inherited from interface smile.classification.Classifier
Classifier.Trainer<T,M extends Classifier<T>>
Field Summary

Fields inherited from class smile.classification.AbstractClassifier
classes
Constructor Summary

Constructors

Constructor

Description

KNN(KNNSearch<T,T> knn, int[] y, int k)

Constructor.
Method Summary

Modifier and Type

Method

Description

static KNN<double[]>

fit(double[][] x, int[] y)

Fits the 1-NN classifier.

static KNN<double[]>

fit(double[][] x, int[] y, int k)

Fits the K-NN classifier.

static <T> KNN<T>

fit(T[] x, int[] y, int k, Distance<T> distance)

Fits the K-NN classifier.

static <T> KNN<T>

fit(T[] x, int[] y, Distance<T> distance)

Fits the 1-NN classifier.

int

predict(T x)

Predicts the class label of an instance.

int

predict(T x, double[] posteriori)

Predicts the class label of an instance and also calculate a posteriori probabilities.

boolean

soft()

Returns true if this is a soft classifier that can estimate the posteriori probabilities of classification.

Methods inherited from class smile.classification.AbstractClassifier
classes, numClasses

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface smile.classification.Classifier
applyAsDouble, applyAsInt, online, predict, predict, predict, predict, predict, predict, score, update, update, update

Constructor Details
- KNN
  
  public KNN(KNNSearch<T,T> knn, int[] y, int k)
  
  Constructor.
  
  Parameters:
  
  knn - k-nearest neighbor search data structure of training instances.
  
  y - training labels.
  
  k - the number of neighbors for classification.
Method Details
- fit
  
  public static <T> KNN<T> fit(T[] x, int[] y, Distance<T> distance)
  
  Fits the 1-NN classifier.
  
  Type Parameters:
  
  T - the data type.
  
  Parameters:
  
  x - training samples.
  
  y - training labels.
  
  distance - the distance function.
  
  Returns:
  
  the model.
- fit
  
  public static <T> KNN<T> fit(T[] x, int[] y, int k, Distance<T> distance)
  
  Fits the K-NN classifier.
  
  Type Parameters:
  
  T - the data type.
  
  Parameters:
  
  x - training samples.
  
  y - training labels.
  
  k - the number of neighbors.
  
  distance - the distance function.
  
  Returns:
  
  the model.
- fit
  
  public static KNN<double[]> fit(double[][] x, int[] y)
  
  Fits the 1-NN classifier.
  
  Parameters:
  
  x - training samples.
  
  y - training labels.
  
  Returns:
  
  the model.
- fit
  
  public static KNN<double[]> fit(double[][] x, int[] y, int k)
  
  Fits the K-NN classifier.
  
  Parameters:
  
  x - training samples.
  
  y - training labels.
  
  k - the number of neighbors for classification.
  
  Returns:
  
  the model.
- predict
  
  public int predict(T x)
  
  Description copied from interface: Classifier
  
  Predicts the class label of an instance.
  
  Parameters:
  
  x - the instance to be classified.
  
  Returns:
  
  the predicted class label.
- soft
  
  public boolean soft()
  
  Description copied from interface: Classifier
  
  Returns true if this is a soft classifier that can estimate the posteriori probabilities of classification.
  
  Returns:
  
  true if soft classifier.
- predict
  
  public int predict(T x, double[] posteriori)
  
  Description copied from interface: Classifier
  
  Predicts the class label of an instance and also calculate a posteriori probabilities. Classifiers may NOT support this method since not all classification algorithms are able to calculate such a posteriori probabilities.
  
  Parameters:
  
  x - an instance to be classified.
  
  posteriori - a posteriori probabilities on output.
  
  Returns:
  
  the predicted class label

Class KNN<T>

Nested Class Summary

Nested classes/interfaces inherited from interface smile.classification.Classifier

Field Summary

Fields inherited from class smile.classification.AbstractClassifier

Constructor Summary

Method Summary

Methods inherited from class smile.classification.AbstractClassifier

Methods inherited from class java.lang.Object

Methods inherited from interface smile.classification.Classifier

Constructor Details

KNN

Method Details

fit

fit

fit

fit

predict

soft

predict