Delete Operation for Two Strings

Updated on 22 May, 2025
Delete Operation for Two Strings header image

Problem Statement

The task is to determine the minimum number of operations required to make two strings, word1 and word2, identical. An operation is defined as deleting exactly one character from either of the two strings. This problem essentially revolves around transforming two words into their longest common subsequence (LCS) by removing the characters that do not appear in the LCS of both strings. The minimum number of deletions needed effectively makes both strings the same, achieved by stripping characters that aren't part of their longest common sequence.

Examples

Example 1

Input:

word1 = "sea", word2 = "eat"

Output:

2

Explanation:

You need one step to make "sea" to "ea" and another step to make "eat" to "ea".

Example 2

Input:

word1 = "vultr", word2 = "lt"

Output:

3

Constraints

  • 1 <= word1.length, word2.length <= 500
  • word1 and word2 consist of only lowercase English letters.

Approach and Intuition

  1. Understanding the Core Concept:
    The primary concept is to minimize the differences between two strings. An effective way of conceptualizing this problem is in terms of creating the longest common subsequence (LCS). The LCS represents the longest sequence of characters that appear in both strings in the same order. By aligning the strings to their LCS, the non-LCS characters are the ones that need to be deleted.

  2. Steps to Compute LCS & Required Deletions:

    • Calculate the length of the longest common subsequence of word1 and word2.
    • The number of deletions required from word1 is (length of word1 - length of LCS).
    • The number of deletions required from word2 is (length of word2 - length of LCS).
    • Total deletions required is the sum of the above two values.
  3. Illustration with Examples:

    • For word1 = "sea" and word2 = "eat", the LCS is "ea", which is of length 2. sea (length 3) requires one deletion (removing 's'), and eat (length 3) also requires one deletion (removing 't'). Thus, a total of 2 deletions are needed.
    • For word1 = "vultr" and word2 = "lt", the LCS is "lt". Here, vultr (length 5) would need 3 deletions (removing 'v', 'u', and 'r'), which equals the total deletions as nothing needs to be removed from lt.
  4. Algorithm Complexity:

    • Given that every character of both strings potentially interacts with every other character in the search for the LCS, this generally leads to a quadratic time complexity, especially in a dynamic programming solution which is commonly used to solve LCS problems. Considering the constraints where lengths can go up to 500, this is feasible.

By following the string transformation into their LCS, one can systematically determine the precise number of edits needed to equalize the two input strings, following the constraints that they comprise only lowercase English letters and their lengths do not exceed moderate limits.

Solutions

  • Java
java
public class Solution {
    public int calculateMinEditDistance(String word1, String word2) {
        int[] previousRow = new int[word2.length() + 1];
        for (int i = 0; i <= word1.length(); i++) {
            int[] currentRow = new int[word2.length() + 1];
            for (int j = 0; j <= word2.length(); j++) {
                if (i == 0 || j == 0)
                    currentRow[j] = i + j;
                else if (word1.charAt(i - 1) == word2.charAt(j - 1))
                    currentRow[j] = previousRow[j - 1];
                else
                    currentRow[j] = 1 + Math.min(previousRow[j], currentRow[j - 1]);
            }
            previousRow = currentRow;
        }
        return previousRow[word2.length()];
    }
}

This solution addresses the task of calculating the minimum number of delete operations required to make two given strings identical. Written in Java, the calculateMinEditDistance method employs a dynamic programming approach.

The primary mechanism utilized is a table (or an array in this case) that incrementsally stores the results of subproblems:

  • Start by initializing a previousRow array of size word2.length() + 1, which keeps track of results from previous iterations.
  • Use a nested loop where the outer loop iterates through each character of word1 and the inner loop through each character of word2.
  • For each pairing of characters from word1 and word2:
    • If either index is 0 (indicating the beginning of word1 or word2), the edit distance is simply the sum of the indices (i + j). This effectively accounts for deleting all previous characters from the longer string.
    • If the current characters of word1 and word2 match (word1.charAt(i - 1) == word2.charAt(j - 1)), the edit distance remains unchanged from the previous step (previousRow[j - 1]).
    • If the characters do not match, calculate the minimum cost by considering either deleting a character from word1 or word2. This is done by taking the minimum of the values from the current row and previous row (1 + Math.min(previousRow[j], currentRow[j - 1])), representing the additional delete operation.
  • Continually update previousRow to the latest currentRow for subsequent iterations.

The process concludes with previousRow[word2.length()], which gives the minimum number of delete operations needed. This approach efficiently uses dynamic programming to build upon previously computed values, ensuring an optimal solution with reduced computational expense. By only saving the current and previous rows rather than the entire matrix, it also optimizes space usage.

Comments

No comments yet.