smile.validation

Model validation.

object bootstrap

Supertypes
class Object
trait Matchable
class Any
Self type
bootstrap.type
object cv

Supertypes
class Object
trait Matchable
class Any
Self type
cv.type
object loocv

Supertypes
class Object
trait Matchable
class Any
Self type
loocv.type
object validate

Attributes

Supertypes
class Object
trait Matchable
class Any
Self type
validate.type

Value members

Concrete methods

def accuracy(truth: Array[Int], prediction: Array[Int]): Double

The accuracy is the proportion of true results (both true positives and true negatives) in the population.

The accuracy is the proportion of true results (both true positives and true negatives) in the population.

Attributes

Adjusted Rand Index. Adjusted Rand Index assumes the generalized hyper-geometric distribution as the model of randomness. The adjusted Rand index has the maximum value 1, and its expected value is 0 in the case of random clusters. A larger adjusted Rand index means a higher agreement between two partitions. The adjusted Rand index is recommended for measuring agreement even when the partitions compared have different numbers of clusters.

Adjusted Rand Index. Adjusted Rand Index assumes the generalized hyper-geometric distribution as the model of randomness. The adjusted Rand index has the maximum value 1, and its expected value is 0 in the case of random clusters. A larger adjusted Rand index means a higher agreement between two partitions. The adjusted Rand index is recommended for measuring agreement even when the partitions compared have different numbers of clusters.

Attributes

def auc(truth: Array[Int], probability: Array[Double]): Double

The area under the curve (AUC). When using normalized units, the area under the curve is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative').

The area under the curve (AUC). When using normalized units, the area under the curve is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative').

Attributes

def confusion(truth: Array[Int], prediction: Array[Int]): ConfusionMatrix

Computes the confusion matrix.

Computes the confusion matrix.

Attributes

def crossentropy(truth: Array[Int], probability: Array[Array[Double]]): Double

Cross entropy generalizes the log loss metric to multiclass problems.

Cross entropy generalizes the log loss metric to multiclass problems.

Attributes

def f1(truth: Array[Int], prediction: Array[Int]): Double

The F-score (or F-measure) considers both the precision and the recall of the test to compute the score. The precision p is the number of correct positive results divided by the number of all positive results, and the recall r is the number of correct positive results divided by the number of positive results that should have been returned.

The F-score (or F-measure) considers both the precision and the recall of the test to compute the score. The precision p is the number of correct positive results divided by the number of all positive results, and the recall r is the number of correct positive results divided by the number of positive results that should have been returned.

The traditional or balanced F-score (F1 score) is the harmonic mean of precision and recall, where an F1 score reaches its best value at 1 and worst at 0.

Attributes

def fallout(truth: Array[Int], prediction: Array[Int]): Double

Fall-out, false alarm rate, or false positive rate (FPR). Fall-out is actually Type I error and closely related to specificity (1 - specificity).

Fall-out, false alarm rate, or false positive rate (FPR). Fall-out is actually Type I error and closely related to specificity (1 - specificity).

Attributes

def fdr(truth: Array[Int], prediction: Array[Int]): Double

The false discovery rate (FDR) is ratio of false positives to combined true and false positives, which is actually 1 - precision.

The false discovery rate (FDR) is ratio of false positives to combined true and false positives, which is actually 1 - precision.

Attributes

def logloss(truth: Array[Int], probability: Array[Double]): Double

Log loss is a evaluation metric for binary classifiers and it is sometimes the optimization objective as well in case of logistic regression and neural networks. Log Loss takes into account the uncertainty of the prediction based on how much it varies from the actual label. This provides a more nuanced view of the performance of the model. In general, minimizing Log Loss gives greater accuracy for the classifier. However, it is susceptible in case of imbalanced data.

Log loss is a evaluation metric for binary classifiers and it is sometimes the optimization objective as well in case of logistic regression and neural networks. Log Loss takes into account the uncertainty of the prediction based on how much it varies from the actual label. This provides a more nuanced view of the performance of the model. In general, minimizing Log Loss gives greater accuracy for the classifier. However, it is susceptible in case of imbalanced data.

Attributes

def mad(truth: Array[Double], prediction: Array[Double]): Double

Mean absolute deviation error.

Mean absolute deviation error.

Attributes

def mcc(truth: Array[Int], prediction: Array[Int]): Double

MCC is a correlation coefficient between prediction and actual values. It is considered as a balanced measure for binary classification, even in unbalanced data sets. It varies between -1 and +1. 1 when there is perfect agreement between ground truth and prediction, -1 when there is a perfect disagreement between ground truth and predictions. MCC of 0 means the model is not better then random.

MCC is a correlation coefficient between prediction and actual values. It is considered as a balanced measure for binary classification, even in unbalanced data sets. It varies between -1 and +1. 1 when there is perfect agreement between ground truth and prediction, -1 when there is a perfect disagreement between ground truth and predictions. MCC of 0 means the model is not better then random.

Attributes

def mse(truth: Array[Double], prediction: Array[Double]): Double

Mean squared error.

Mean squared error.

Attributes

def nmi(y1: Array[Int], y2: Array[Int]): Double

Normalized mutual information (normalized by max(H(y1), H(y2)) between two clusterings.

Normalized mutual information (normalized by max(H(y1), H(y2)) between two clusterings.

Attributes

def precision(truth: Array[Int], prediction: Array[Int]): Double

The precision or positive predictive value (PPV) is ratio of true positives to combined true and false positives, which is different from sensitivity.

The precision or positive predictive value (PPV) is ratio of true positives to combined true and false positives, which is different from sensitivity.

Attributes

def randIndex(y1: Array[Int], y2: Array[Int]): Double

Rand index is defined as the number of pairs of objects that are either in the same group or in different groups in both partitions divided by the total number of pairs of objects. The Rand index lies between 0 and 1. When two partitions agree perfectly, the Rand index achieves the maximum value 1. A problem with Rand index is that the expected value of the Rand index between two random partitions is not a constant. This problem is corrected by the adjusted Rand index.

Rand index is defined as the number of pairs of objects that are either in the same group or in different groups in both partitions divided by the total number of pairs of objects. The Rand index lies between 0 and 1. When two partitions agree perfectly, the Rand index achieves the maximum value 1. A problem with Rand index is that the expected value of the Rand index between two random partitions is not a constant. This problem is corrected by the adjusted Rand index.

Attributes

def recall(truth: Array[Int], prediction: Array[Int]): Double

In information retrieval area, sensitivity is called recall.

In information retrieval area, sensitivity is called recall.

Attributes

def rmse(truth: Array[Double], prediction: Array[Double]): Double

Root mean squared error.

Root mean squared error.

Attributes

def rss(truth: Array[Double], prediction: Array[Double]): Double

Residual sum of squares.

Residual sum of squares.

Attributes

def sensitivity(truth: Array[Int], prediction: Array[Int]): Double

Sensitivity or true positive rate (TPR) (also called hit rate, recall) is a statistical measures of the performance of a binary classification test. Sensitivity is the proportion of actual positives which are correctly identified as such.

Sensitivity or true positive rate (TPR) (also called hit rate, recall) is a statistical measures of the performance of a binary classification test. Sensitivity is the proportion of actual positives which are correctly identified as such.

Attributes

def specificity(truth: Array[Int], prediction: Array[Int]): Double

Specificity or True Negative Rate is a statistical measures of the performance of a binary classification test. Specificity measures the proportion of negatives which are correctly identified.

Specificity or True Negative Rate is a statistical measures of the performance of a binary classification test. Specificity measures the proportion of negatives which are correctly identified.