Package smile.math.distance
Class EditDistance
java.lang.Object
smile.math.distance.EditDistance
- All Implemented Interfaces:
Serializable
,ToDoubleBiFunction<String,
,String> Distance<String>
,Metric<String>
The Edit distance between two strings is a metric for measuring the amount
of difference between two sequences. The Levenshtein distance between two
strings is given by the minimum number of operations needed to transform one
string into the other, where an operation is an insertion, deletion, or
substitution of a single character. A generalization of the Levenshtein
distance (Damerau-Levenshtein distance) allows the transposition of two
characters as an operation.
Given two strings x and y of length m and n (suppose n >= m
), this
implementation takes O(ne) time and O(mn) space by an extended Ukkonen's
algorithm in case of unit cost, where e is the edit distance between x and y.
Thus, this algorithm is output sensitive. The smaller the distance, the faster
it runs.
For weighted cost, this implements the regular dynamic programming algorithm, which takes O(mn) time and O(m) space.
- See Also:
-
Constructor Summary
ConstructorDescriptionConstructor.EditDistance
(boolean damerau) Constructor.EditDistance
(int maxStringLength) Constructor.EditDistance
(int[][] weight) Constructor.EditDistance
(int[][] weight, double radius) Constructor.EditDistance
(int maxStringLength, boolean damerau) Constructor. -
Method Summary
Modifier and TypeMethodDescriptiondouble
d
(char[] a, char[] b) Edit distance between two strings.double
Edit distance between two strings.static int
damerau
(char[] a, char[] b) Damerau-Levenshtein distance between two strings allows insertion, deletion, substitution, or transposition of characters.static int
Damerau-Levenshtein distance between two strings allows insertion, deletion, substitution, or transposition of characters.static int
levenshtein
(char[] a, char[] b) Levenshtein distance between two strings allows insertion, deletion, or substitution of characters.static int
levenshtein
(String a, String b) Levenshtein distance between two strings allows insertion, deletion, or substitution of characters.toString()
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface smile.math.distance.Distance
apply, applyAsDouble, D, D
-
Constructor Details
-
EditDistance
public EditDistance()Constructor. Multi-thread safe Levenshtein distance. -
EditDistance
public EditDistance(boolean damerau) Constructor. Multi-thread safe Damerau-Levenshtein distance.- Parameters:
damerau
- if true, calculate Damerau-Levenshtein distance instead of plain Levenshtein distance.
-
EditDistance
public EditDistance(int maxStringLength) Constructor. Highly efficient Levenshtein distance but not multi-thread safe.- Parameters:
maxStringLength
- the maximum length of strings that will be feed to this algorithm.
-
EditDistance
public EditDistance(int maxStringLength, boolean damerau) Constructor. Highly efficient Damerau-Levenshtein distance but not multi-thread safe.- Parameters:
maxStringLength
- the maximum length of strings that will be feed to this algorithm.damerau
- if true, calculate Damerau-Levenshtein distance instead of plain Levenshtein distance.
-
EditDistance
public EditDistance(int[][] weight) Constructor. Weighted Levenshtein distance without path constraints. Only insertion, deletion, and substitution operations are supported.- Parameters:
weight
- the weight matrix.
-
EditDistance
public EditDistance(int[][] weight, double radius) Constructor. Weighted Levenshtein distance with Sakoe-Chiba band, which improve computational cost. Only insertion, deletion, and substitution operations are supported.- Parameters:
weight
- the weight matrix.radius
- the window width of Sakoe-Chiba band in terms of percentage of sequence length.
-
-
Method Details
-
toString
-
d
Edit distance between two strings. O(mn) time and O(n) space for weighted edit distance. O(ne) time and O(mn) space for unit cost edit distance. For weighted edit distance, this method is multi-thread safe. However, it is NOT multi-thread safe for unit cost edit distance. -
d
public double d(char[] a, char[] b) Edit distance between two strings. O(mn) time and O(n) space for weighted edit distance. O(ne) time and O(mn) space for unit cost edit distance. For weighted edit distance, this method is multi-thread safe. However, it is NOT multi-thread safe for unit cost edit distance.- Parameters:
a
- a string.b
- a string.- Returns:
- the distance.
-
levenshtein
Levenshtein distance between two strings allows insertion, deletion, or substitution of characters. O(mn) time and O(n) space. Multi-thread safe.- Parameters:
a
- a string.b
- a string.- Returns:
- the distance.
-
levenshtein
public static int levenshtein(char[] a, char[] b) Levenshtein distance between two strings allows insertion, deletion, or substitution of characters. O(mn) time and O(n) space. Multi-thread safe.- Parameters:
a
- a string.b
- a string.- Returns:
- the distance.
-
damerau
Damerau-Levenshtein distance between two strings allows insertion, deletion, substitution, or transposition of characters. O(mn) time and O(n) space. Multi-thread safe.- Parameters:
a
- a string.b
- a string.- Returns:
- the distance.
-
damerau
public static int damerau(char[] a, char[] b) Damerau-Levenshtein distance between two strings allows insertion, deletion, substitution, or transposition of characters. O(mn) time and O(n) space. Multi-thread safe.- Parameters:
a
- a string.b
- a string.- Returns:
- the distance.
-