Edit Distance

Updated on 23 May, 2025
Edit Distance header image

Problem Statement

Given two input strings, word1 and word2, the task is to determine the minimum number of single-character operations required to transform word1 into word2. The permitted operations involve:

  • Inserting a character into the string.
  • Deleting a character from the string.
  • Replacing one character in the string with another.

This problem can be visualized as finding the minimal path through a series of edits that converts the first string into the second string, where each operation counts as a single step along that path.

Examples

Example 1

Input:

word1 = "horse", word2 = "ros"

Output:

3

Explanation:

horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')

Example 2

Input:

word1 = "intention", word2 = "execution"

Output:

5

Explanation:

intention -> inention (remove 't')
inention -> enention (replace 'i' with 'e')
enention -> exention (replace 'n' with 'x')
exention -> exection (replace 'n' with 'c')
exection -> execution (insert 'u')

Constraints

  • 0 <= word1.length, word2.length <= 500
  • word1 and word2 consist of lowercase English letters.

Approach and Intuition

To solve the problem of converting one string to another with the minimum operations, we can utilize Dynamic Programming (DP), where each subproblem builds upon solutions to smaller subproblems. The core idea revolves around creating a DP table where dp[i][j] represents the minimum number of operations required to convert the substring word1[0...i-1] to word2[0...j-1].

  1. Initialization:

    • If word1 is an empty string, the number of operations to convert it to word2 is simply the length of word2 (all insert operations).
    • If word2 is an empty string, the number of operations to convert word1 to it is the length of word1 (all delete operations).
  2. Recursive Case:

    • For two indices i and j, consider the characters word1[i-1] and word2[j-1]. If these characters are the same, the operations required to transform up to these indices is the same as transforming up to i-1 and j-1, hence dp[i][j] = dp[i-1][j-1].
    • If the characters differ, consider three possible operations:
      • Insert: Inserting the j-th character of word2 into word1 would mean working off the results already obtained for dp[i][j-1].
      • Delete: Deleting the i-th character from word1 would rely on results from dp[i-1][j].
      • Replace: Replacing i-th character of word1 with the j-th of word2 builds on dp[i-1][j-1].
  3. Choose the Minimum:

    • After considering all possible operations, the value at dp[i][j] is the minimal value among the three evaluated possibilities (insert, delete, replace) plus one (the operation itself).
  4. Completion:

    • The final answer, the minimal number of operations required to convert word1 to word2, is found in dp[length of word1][length of word2].

Example Walkthrough

  • Example 1: Converting "horse" to "ros"

    • Starting from 'h' in "horse", replacing it with 'r' forms "rorse".
    • Removing the first 'r' in "rorse" forms "rose".
    • Removing 'e' in "rose" leads to "ros".
    • Total operations: 3
  • Example 2: Converting "intention" to "execution"

    • Initial deletion, replacements and an insertion yield a minimal transformation sequence involving 5 operations, fully detailed in the problem examples.

This approach ensures we systematically evaluate the possible transformations from one string to another, leveraging prior calculations to minimize the overall number of operations, falling within the constraints given (string lengths up to 500).

Solutions

  • C++
  • Java
  • C
  • JavaScript
  • Python
cpp
class Solution {
public:
    vector<vector<int>> distanceMemo;
    int minDistance(string str1, string str2) {
        int len1 = str1.size();
        int len2 = str2.size();
        if (len1 == 0) return len2;
        if (len2 == 0) return len1;
        vector<vector<int>> dp(len1 + 1, vector<int>(len2 + 1, 0));
        for (int i = 1; i <= len1; ++i) {
            dp[i][0] = i;
        }
        for (int j = 1; j <= len2; ++j) {
            dp[0][j] = j;
        }
        for (int i = 1; i <= len1; ++i) {
            for (int j = 1; j <= len2; ++j) {
                if (str1[i - 1] == str2[j - 1]) {
                    dp[i][j] = dp[i - 1][j - 1];
                } else {
                    dp[i][j] = 1 + min(dp[i - 1][j], min(dp[i][j - 1], dp[i - 1][j - 1]));
                }
            }
        }
        return dp[len1][len2];
    }
};

The provided C++ code defines a solution for calculating the edit distance between two strings, using the dynamic programming approach. This technique essentially measures how many operations (insertions, deletions, or substitutions) are required to transform one string into another. Here's a brief summary of how the provided solution works:

  • First, the solution initializes a 2D vector dp of size (len1 + 1) x (len2 + 1), where len1 is the length of str1 and len2 is the length of str2. Each cell in dp represents the edit distance between substrings of str1 and str2.

  • Initialize the first row and column of the dp table. The first row represents transforming an empty string to all prefixes of str2 by insertions, while the first column represents transforming all prefixes of str1 to an empty string by deletions.

  • Iterate over each character in both strings:

    • If the characters match (str1[i - 1] == str2[j - 1]), set dp[i][j] to dp[i - 1][j - 1] reflecting no change is needed for this character.
    • If the characters do not match, set dp[i][j] to the minimum of three scenarios:
      • Inserting a character (dp[i][j - 1] + 1)
      • Deleting a character (dp[i - 1][j] + 1)
      • Replacing a character (dp[i - 1][j - 1] + 1)
  • Return dp[len1][len2], which contains the edit distance between the entire str1 and str2.

This method ensures an efficient computation of the edit distance by considering each substring incrementally, utilizing the results of prior computations stored in the dp array. The result is a robust solution that scales linearly with the size of the input strings.

java
class Solution {
    public int calculateMinEdits(String text1, String text2) {
        int len1 = text1.length();
        int len2 = text2.length();

        if (len1 == 0) {
            return len2;
        }
        if (len2 == 0) {
            return len1;
        }

        int[][] editDistance = new int[len1 + 1][len2 + 1];

        for (int i = 1; i <= len1; i++) {
            editDistance[i][0] = i;
        }

        for (int j = 1; j <= len2; j++) {
            editDistance[0][j] = j;
        }

        for (int i = 1; i <= len1; i++) {
            for (int j = 1; j <= len2; j++) {
                if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
                    editDistance[i][j] = editDistance[i - 1][j - 1];
                } else {
                    editDistance[i][j] = 1 + Math.min(
                        editDistance[i - 1][j], // Delete
                        Math.min(
                            editDistance[i][j - 1], // Insert
                            editDistance[i - 1][j - 1] // Replace
                        )
                    );
                }
            }
        }

        return editDistance[len1][len2];
    }
}

This solution in Java addresses the problem of finding the minimum edit distance between two strings, enabling applications like spell correction, DNA sequencing, and other areas where determining how similar two strings are is crucial.

The calculateMinEdits function calculates the minimum number of operations required to convert text1 to text2 where the allowed operations are insert, delete, or replace a character. Here's a breakdown of the approach:

  • Initialize lengths of text1 (len1) and text2 (len2). In cases where either string is empty, the edit distance directly equates to the length of the other string, accommodating for all insert or delete operations.

  • The function uses a dynamic programming approach to build a 2D array editDistance where an element at position [i][j] represents the minimum edit distance between the first i characters of text1 and the first j characters of text2.

  • Populate the first row and the first column of editDistance. These represent scenarios where one string is empty, thereby requiring a number of operations equal to the current index to transform the string (either all inserts or all deletes).

  • Iteratively fill out the 2D array using the following logic:

    • If characters text1[i-1] and text2[j-1] match, set editDistance[i][j] to editDistance[i-1][j-1] as no additional edits are necessary.

    • If they don't match, compute editDistance[i][j] based on the minimum cost of either deleting, inserting, or replacing a character:

      • Deleting text1[i] adjusts the problem to previous characters of text1 and all processed characters of text2 (editDistance[i-1][j]).
      • Inserting a new character to match text2[j] changes the problem to all processed characters of text1 and the previous characters of text2 (editDistance[i][j-1]).
      • Replacing text1[i] with text2[j] or vice versa relates to the problem of the previous characters of both strings (editDistance[i-1][j-1]).
  • The value in editDistance[len1][len2] by the end of these operations gives the minimum edit distance.

This method ensures that solutions are obtained efficiently and accurately, leveraging dynamic programming to reduce the complexity of repeated subproblems.

c
int minimum(int x, int y, int z) {
    int result = x < y ? x : y;
    return result < z ? result : z;
}
int calculateMinDistance(char* s1, char* s2) {
    int len1 = strlen(s1);
    int len2 = strlen(s2);
    if (len1 == 0) {
        return len2;
    }
    if (len2 == 0) {
        return len1;
    }
    int dp[len1 + 1][len2 + 1];
    dp[0][0] = 0;
    for (int i = 1; i <= len1; i++) {
        dp[i][0] = i;
    }
    for (int j = 1; j <= len2; j++) {
        dp[0][j] = j;
    }
    for (int i = 1; i <= len1; i++) {
        for (int j = 1; j <= len2; j++) {
            if (s1[i - 1] == s2[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1];
            } else {
                dp[i][j] = minimum(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]) + 1;
            }
        }
    }
    return dp[len1][len2];
}

This C program computes the Edit Distance (also known as Levenshtein distance) between two strings, crucial for tasks such as natural language processing, data entry verification, and series pattern matching. The program uses dynamic programming to build a solution that determines the minimum number of operations required to transform one string into another, where allowable operations include insertions, deletions, or substitutions of characters.

Here's a brief breakdown of how the solution functions:

  • Minimum Function: Implements a helper function named minimum which calculates the smallest of three integers. This assists in determining the minimum set of operations at each step.

  • Calculate Minimum Distance Function:

    • Begins by retrieving the lengths of the two input strings s1 and s2.
    • It employs a 2D array dp where dp[i][j] holds the minimum edit distance between the first i characters of s1 and the first j characters of s2.
    • Initializes the array with base cases:
      • If one of the strings is empty, the edit distance equals the length of the other string (reflecting the required number of insertions).
    • Iteratively fills up the dp array using the recurrence relation:
      • If the characters match (s1[i - 1] == s2[j - 1]), the cost remains the same as dp[i-1][j-1] (no additional cost).
      • If the characters do not match, it takes the minimum value between deleting, inserting, or substituting a character, then adds 1 to the result.

By the end of the matrix computation, dp[len1][len2] provides the minimum number of transformations required to change the complete s1 to s2. This approach is efficient in terms of both understanding and computing, leveraging dynamic programming to manage otherwise potentially exponential computational costs through naive recursive approaches.

js
var editDistance = function (str1, str2) {
    let len1 = str1.length;
    let len2 = str2.length;
    if (len1 == 0) {
        return len2;
    }
    if (len2 == 0) {
        return len1;
    }
    let distanceTable = Array.from(
        Array(len1 + 1),
        () => new Array(len2 + 1),
    );
    for (let i = 0; i <= len1; i++) {
        distanceTable[i][0] = i;
    }
    for (let j = 0; j <= len2; j++) {
        distanceTable[0][j] = j;
    }
    for (let i = 1; i <= len1; i++) {
        for (let j = 1; j <= len2; j++) {
            if (str1.charAt(i - 1) == str2.charAt(j - 1)) {
                distanceTable[i][j] = distanceTable[i - 1][j - 1];
            } else {
                distanceTable[i][j] =
                    Math.min(
                        distanceTable[i - 1][j],
                        distanceTable[i][j - 1],
                        distanceTable[i - 1][j - 1],
                    ) + 1;
            }
        }
    }
    return distanceTable[len1][len2];
};

The provided JavaScript code calculates the edit distance between two strings, str1 and str2. The solution uses dynamic programming to build a 2D table (distanceTable) that records the minimum number of operations needed to convert str1 into str2. The operations considered in computing the edit distance are insertions, deletions, and substitutions.

  • Initialize a table, distanceTable, with dimensions (len1+1) x (len2+1) where len1 and len2 are the lengths of str1 and str2 respectively.
  • Set the first row and first column of distanceTable. These represent the number of operations needed to convert str1 into an empty string and vice versa, acting as base cases for the recurrence relation.
  • Iterate over the matrix starting from distanceTable[1][1] to distanceTable[len1][len2], filling in each cell based on the characters of the strings and the values of the neighboring cells.
    • If the characters in both strings match at any position, the edited distance is the same as that for one character less in both strings.
    • If the characters don't match, the value is the minimum of the three possible previous operations (insert, delete, substitute), plus one.
  • The final cell, distanceTable[len1][len2], will contain the edit distance between str1 and str2.

The runtime complexity of this algorithm is O(len1*len2), making it efficient for moderate-sized strings.

python
class Solution:
    def calculateMinEditDistance(self, str1: str, str2: str) -> int:
        len1 = len(str1)
        len2 = len(str2)
        if len1 == 0:
            return len2
        if len2 == 0:
            return len1
        dp_table = [
            [0] * (len2 + 1) for _ in range(len1 + 1)
        ]
        for i in range(1, len1 + 1):
            dp_table[i][0] = i
        for j in range(1, len2 + 1):
            dp_table[0][j] = j
        for i in range(1, len1 + 1):
            for j in range(1, len2 + 1):
                if str1[i - 1] == str2[j - 1]:
                    dp_table[i][j] = dp_table[i - 1][j - 1]
                else:
                    dp_table[i][j] = 1 + min(
                        dp_table[i - 1][j],
                        dp_table[i][j - 1],
                        dp_table[i - 1][j - 1]
                    )
        return dp_table[len1][len2]

Edit Distance Solution in Python:

The provided Python solution calculates the minimum edit distance between two strings, which refers to the minimum number of operations (insertions, deletions, or substitutions) required to transform one string into the other. This is a common algorithm, especially in applications like text similarity and spelling correction.

The approach uses dynamic programming to build a solution efficiently. Here’s how the solution works:

  • Initializes a 2D list (dp_table) where dp_table[i][j] will hold the minimum edit distance between the first i characters of str1 and the first j characters of str2.
  • Fills in the base cases where one of the strings is empty; the edit distance will then simply be the length of the other string.
  • Iteratively fills the table by comparing characters of str1 and str2. If they match, no operation is needed and it takes the value from the top-left diagonal. If there is a mismatch, it computes the minimum cost between deleting, inserting, or replacing a character.

As a result of this method, the value at dp_table[len1][len2] provides the minimum edit distance for transforming the entire str1 into str2.

Complete these steps to fully grasp the problem and the solution’s dynamic planning method:

  1. Understand the problem domain and the need for calculating edit distances.
  2. Review dynamic programming concepts particularly related to 2D tables.
  3. Analyze the initialization and iterative filling of the table for understanding how indices relate to string operations.
  4. Notice how the solution optimizes computations by building on previously computed values, avoiding redundant operations.
  5. Execute and trace the code with sample inputs to observe how the 2D table populates and reaches the solution.

By following these points, effectively apply this method to relevant problems and adapt the approach for variations or constraints specific to different scenarios.

Comments

No comments yet.