Class IsolationForest
- All Implemented Interfaces:
Serializable
Rather than selecting a random feature and value within the range of data, Extended isolation forest slices the data using hyperplanes with random slopes. The consistency and reliability of the algorithm is much improved using this extension.
For an N dimensional dataset, we can consider N levels of extension.
In the fully extended case, we select our normal vector by drawing
each component from N (0, 1)
as seen before. This results
in hyperplanes that can intersect any of the coordinates axes.
However, we can exclude some dimensions in specifying the lines
so that they are parallel to the coordinate axes. This is simply
accomplished by setting a coordinate of the normal vector to zero.
The lowest level of extension of the Extended Isolation Forest
is coincident with the standard Isolation Forest. As the extension
level of the algorithm is increased, the bias of the algorithm is
reduced. The idea of having multiple levels of extension can be useful
in cases where the dynamic range of the data in various dimensions are
very different. In such cases, reducing the extension level can help in
more appropriate selection of split hyperplanes.
References
- Fei Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation Forest. ICDM, 413–422, 2008.
- Sahand Hariri, Matias Carrasco Kind, and Robert J. Brunner. Extended Isolation Forest. IEEE Transactions on Knowledge and Data Engineering, 2019.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final record
Isolation Forest hyperparameters. -
Constructor Summary
ConstructorsConstructorDescriptionIsolationForest
(int n, int extensionLevel, IsolationTree... trees) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionstatic IsolationForest
fit
(double[][] data) Fits an isolation forest.static IsolationForest
fit
(double[][] data, IsolationForest.Options options) Fits a random forest for classification.int
Returns the extension level.double
score
(double[] x) Returns the anomaly score.double[]
score
(double[][] x) Returns the anomaly scores.int
size()
Returns the number of trees in the model.trees()
Returns the isolation trees in the model.
-
Constructor Details
-
IsolationForest
Constructor.- Parameters:
n
- the number of samples to train the forest.extensionLevel
- the extension level, i.e. how many dimension are specified in the random slope.trees
- forest of isolation trees.
-
-
Method Details
-
fit
Fits an isolation forest.- Parameters:
data
- the training data.- Returns:
- the model.
-
fit
Fits a random forest for classification.- Parameters:
data
- the training data.options
- the hyperparameters.- Returns:
- the model.
-
size
public int size()Returns the number of trees in the model.- Returns:
- the number of trees in the model.
-
trees
Returns the isolation trees in the model.- Returns:
- the isolation trees in the model.
-
getExtensionLevel
public int getExtensionLevel()Returns the extension level.- Returns:
- the extension level.
-
score
public double score(double[] x) Returns the anomaly score.- Parameters:
x
- the sample.- Returns:
- the anomaly score.
-
score
public double[] score(double[][] x) Returns the anomaly scores.- Parameters:
x
- the samples.- Returns:
- the anomaly scores.
-