smile.anomaly.IsolationForest

All Implemented Interfaces:: Serializable

public class IsolationForest extends Object implements Serializable

Isolation forest is an unsupervised learning algorithm for anomaly detection that works on the principle of isolating anomalies. The algorithm recursively generates partitions on the sample by randomly selecting an attribute and then randomly selecting a split value for the attribute, between the minimum and maximum values allowed for that attribute. The recursive partitioning can be represented by a tree structure named Isolation Tree. The number of partitions required to isolate a point can be interpreted as the length of the path, within the tree, to reach a terminating node starting from the root. When the forest of isolation trees collectively produces shorter path lengths for some samples, they are likely to be anomalies.

Rather than selecting a random feature and value within the range of data, Extended isolation forest slices the data using hyperplanes with random slopes. The consistency and reliability of the algorithm is much improved using this extension.

For an N dimensional dataset, we can consider N levels of extension. In the fully extended case, we select our normal vector by drawing each component from N (0, 1) as seen before. This results in hyperplanes that can intersect any of the coordinates axes. However, we can exclude some dimensions in specifying the lines so that they are parallel to the coordinate axes. This is simply accomplished by setting a coordinate of the normal vector to zero. The lowest level of extension of the Extended Isolation Forest is coincident with the standard Isolation Forest. As the extension level of the algorithm is increased, the bias of the algorithm is reduced. The idea of having multiple levels of extension can be useful in cases where the dynamic range of the data in various dimensions are very different. In such cases, reducing the extension level can help in more appropriate selection of split hyperplanes.

References

Fei Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation Forest. ICDM, 413–422, 2008.
Sahand Hariri, Matias Carrasco Kind, and Robert J. Brunner. Extended Isolation Forest. IEEE Transactions on Knowledge and Data Engineering, 2019.

See Also:

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final record

IsolationForest.Options

Isolation Forest hyperparameters.
Constructor Summary

Constructors

Constructor

Description

IsolationForest(int n, int extensionLevel, IsolationTree... trees)

Constructor.
Method Summary

Modifier and Type

Method

Description

static IsolationForest

fit(double[][] data)

Fits an isolation forest.

static IsolationForest

fit(double[][] data, IsolationForest.Options options)

Fits a random forest for classification.

int

getExtensionLevel()

Returns the extension level.

double

score(double[] x)

Returns the anomaly score.

double[]

score(double[][] x)

Returns the anomaly scores.

int

size()

Returns the number of trees in the model.

IsolationTree[]

trees()

Returns the isolation trees in the model.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- IsolationForest
  
  public IsolationForest(int n, int extensionLevel, IsolationTree... trees)
  
  Constructor.
  
  Parameters:
  
  n - the number of samples to train the forest.
  
  extensionLevel - the extension level, i.e. how many dimension are specified in the random slope.
  
  trees - forest of isolation trees.
Method Details
- fit
  
  public static IsolationForest fit(double[][] data)
  
  Fits an isolation forest.
  
  Parameters:
  
  data - the training data.
  
  Returns:
  
  the model.
- fit
  
  public static IsolationForest fit(double[][] data, IsolationForest.Options options)
  
  Fits a random forest for classification.
  
  Parameters:
  
  data - the training data.
  
  options - the hyperparameters.
  
  Returns:
  
  the model.
- size
  
  public int size()
  
  Returns the number of trees in the model.
  
  Returns:
  
  the number of trees in the model.
- trees
  
  public IsolationTree[] trees()
  
  Returns the isolation trees in the model.
  
  Returns:
  
  the isolation trees in the model.
- getExtensionLevel
  
  public int getExtensionLevel()
  
  Returns the extension level.
  
  Returns:
  
  the extension level.
- score
  
  public double score(double[] x)
  
  Returns the anomaly score.
  
  Parameters:
  
  x - the sample.
  
  Returns:
  
  the anomaly score.
- score
  
  public double[] score(double[][] x)
  
  Returns the anomaly scores.
  
  Parameters:
  
  x - the samples.
  
  Returns:
  
  the anomaly scores.

Class IsolationForest

References

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

IsolationForest

Method Details

fit

fit

size

trees

getExtensionLevel

score

score