Class EditDistance
java.lang.Object
smile.math.distance.EditDistance
- All Implemented Interfaces:
Serializable, ToDoubleBiFunction<String,String>, Distance<String>, Metric<String>
The Edit distance between two strings is a metric for measuring the amount
of difference between two sequences. The Levenshtein distance between two
strings is given by the minimum number of operations needed to transform one
string into the other, where an operation is an insertion, deletion, or
substitution of a single character. A generalization of the Levenshtein
distance (Damerau-Levenshtein distance) allows the transposition of two
characters as an operation.
Given two strings x and y of length m and n (suppose n >= m), this
implementation takes O(ne) time and O(mn) space by an extended Ukkonen's
algorithm in case of unit cost, where e is the edit distance between x and y.
Thus, this algorithm is output sensitive. The smaller the distance, the faster
it runs.
For weighted cost, this implements the regular dynamic programming algorithm, which takes O(mn) time and O(m) space.
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionConstructor of Levenshtein distance.EditDistance(boolean damerau) Constructor of Damerau-Levenshtein distance.EditDistance(int initMatrixSize) Constructor of Levenshtein distance.EditDistance(int[][] weight) Constructor.EditDistance(int[][] weight, double radius) Constructor.EditDistance(int initMatrixSize, boolean damerau) Constructor of Damerau-Levenshtein distance. -
Method Summary
Modifier and TypeMethodDescriptiondoubled(char[] a, char[] b) Edit distance between two strings.doubleEdit distance between two strings.static intdamerau(char[] a, char[] b) Damerau-Levenshtein distance between two strings allows insertion, deletion, substitution, or transposition of characters.static intDamerau-Levenshtein distance between two strings allows insertion, deletion, substitution, or transposition of characters.static intlevenshtein(char[] a, char[] b) Levenshtein distance between two strings allows insertion, deletion, or substitution of characters.static intlevenshtein(String a, String b) Levenshtein distance between two strings allows insertion, deletion, or substitution of characters.toString()Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface Distance
apply, applyAsDouble, pdist, pdist
-
Constructor Details
-
EditDistance
public EditDistance()Constructor of Levenshtein distance. -
EditDistance
public EditDistance(boolean damerau) Constructor of Damerau-Levenshtein distance.- Parameters:
damerau- if true, calculate Damerau-Levenshtein distance instead of plain Levenshtein distance.
-
EditDistance
public EditDistance(int initMatrixSize) Constructor of Levenshtein distance. The algorithm is highly efficient for small edit distance.- Parameters:
initMatrixSize- the initial DP matrix size, which should be the estimated maximum length of strings that will be feed to this algorithm. If longer strings are encountered, the cost matrix will be resized.
-
EditDistance
public EditDistance(int initMatrixSize, boolean damerau) Constructor of Damerau-Levenshtein distance. The algorithm is highly efficient for small edit distance.- Parameters:
initMatrixSize- the initial DP matrix size, which should be the estimated maximum length of strings that will be feed to this algorithm. If longer strings are encountered, the cost matrix will be resized.damerau- if true, calculate Damerau-Levenshtein distance instead of plain Levenshtein distance.
-
EditDistance
public EditDistance(int[][] weight) Constructor. Weighted Levenshtein distance without path constraints. Only insertion, deletion, and substitution operations are supported.- Parameters:
weight- the weight matrix.
-
EditDistance
public EditDistance(int[][] weight, double radius) Constructor. Weighted Levenshtein distance with Sakoe-Chiba band, which improve computational cost. Only insertion, deletion, and substitution operations are supported.- Parameters:
weight- the weight matrix.radius- the window width of Sakoe-Chiba band in terms of percentage of sequence length.
-
-
Method Details
-
toString
-
d
-
d
public double d(char[] a, char[] b) Edit distance between two strings. O(mn) time and O(n) space for weighted edit distance. O(ne) time and O(mn) space for unit cost edit distance.- Parameters:
a- a string.b- a string.- Returns:
- the distance.
-
levenshtein
-
levenshtein
public static int levenshtein(char[] a, char[] b) Levenshtein distance between two strings allows insertion, deletion, or substitution of characters. O(mn) time and O(n) space.- Parameters:
a- a string.b- a string.- Returns:
- the distance.
-
damerau
-
damerau
public static int damerau(char[] a, char[] b) Damerau-Levenshtein distance between two strings allows insertion, deletion, substitution, or transposition of characters. O(mn) time and O(n) space.- Parameters:
a- a string.b- a string.- Returns:
- the distance.
-