Remove All Occurrences of a Substring

Updated on 24 June, 2025
Remove All Occurrences of a Substring header image

Problem Statement

Given two strings called s and part, we are tasked with removing all occurrences of the substring part from the string s. This process involves iteratively searching for the leftmost occurrence of the substring part in s and removing it each time it is found. This operation is repeated until no more occurrences of part can be found within s. The final form of the string s, after all such occurrences have been removed, is what we return. This problem is mainly about manipulating strings based on the presence of certain substrings, ensuring all instances are diligently removed.

Examples

Example 1

Input:

s = "daabcbaabcbc", part = "abc"

Output:

"dab"

Explanation:

The following operations are done:
- s = "daabcbaabcbc", remove "abc" starting at index 2, so s = "dabaabcbc".
- s = "dabaabcbc", remove "abc" starting at index 4, so s = "dababc".
- s = "dababc", remove "abc" starting at index 3, so s = "dab".
Now s has no occurrences of "abc".

Example 2

Input:

s = "axxxxyyyyb", part = "xy"

Output:

"ab"

Explanation:

The following operations are done:
- s = "axxxxyyyyb", remove "xy" starting at index 4 so s = "axxxyyyb".
- s = "axxxyyyb", remove "xy" starting at index 3 so s = "axxyyb".
- s = "axxyyb", remove "xy" starting at index 2 so s = "axyb".
- s = "axyb", remove "xy" starting at index 1 so s = "ab".
Now s has no occurrences of "xy".

Constraints

  • 1 <= s.length <= 1000
  • 1 <= part.length <= 1000
  • s​​​​​​ and part consists of lowercase English letters.

Approach and Intuition

  1. Start with the provided strings s and part.
  2. The goal is to continuously search for and remove the substring part from s. The key function to utilize here might be a string search which locates the leftmost occurrence of part in s.
  3. Once part is found, it should be removed, and this adjusted version of s should be the new string on which to continue searching.
  4. This process repeats until part can no longer be found within s.

Examples Breakdown:

  • Example 1:

    • Start with s = "daabcbaabcbc" and part = "abc".
    • In each step, locate abc, remove it, and modify s accordingly, until abc can no longer be found.
    • The modifications through steps show the progressive removal of the substring:
      • daabcbaabcbc -> dabaabcbc -> dababc -> dab
  • Example 2:

    • Starting with s = "axxxxyyyyb" and part = "xy".
    • A similar approach is taken where xy is sought and sequentially removed.
    • Progressive updates:
      • axxxxyyyyb -> axxxyyyb -> axxyyb -> axyb -> ab

This iterative approach ensures that every instance of part is removed before the final string is derived. Such a strategy is comprehensively interpreted from the examples which exhibit all the steps from initial string s to the final result after all transformations.

Solutions

  • C++
  • Java
  • Python
cpp
class Solution {
public:
    string eliminateSubsequence(string str, string sub) {
        vector<int> lps = buildLPS(sub);
        stack<char> stk;
        vector<int> matchPos(str.length() + 1, 0);

        for (int i = 0, j = 0; i < str.length(); i++) {
            char current = str[i];
            stk.push(current);

            if (current == sub[j]) {
                matchPos[stk.size()] = ++j;

                if (j == sub.length()) {
                    for (int k = 0; k < sub.length(); k++) stk.pop();
                    j = stk.empty() ? 0 : matchPos[stk.size()];
                }
            } else {
                if (j != 0) {
                    i--;
                    j = lps[j - 1];
                    stk.pop();
                } else {
                    matchPos[stk.size()] = 0;
                }
            }
        }

        string res;
        while (!stk.empty()) {
            res = stk.top() + res;
            stk.pop();
        }

        return res;
    }

private:
    vector<int> buildLPS(string p) {
        vector<int> lpsArray(p.length(), 0);
        int len = 0;

        for (int i = 1; i < p.length();) {
            if (p[i] == p[len]) {
                lpsArray[i++] = ++len;
            } else {
                if (len != 0) {
                    len = lpsArray[len - 1];
                } else {
                    lpsArray[i++] = 0;
                }
            }
        }

        return lpsArray;
    }
};

The solution provided here is a C++ function to remove all occurrences of a substring from a given string. It uses the KMP (Knuth-Morris-Pratt) algorithm to efficiently search and remove the specified substring. Here’s a concise explanation of the code:

  • The eliminateSubsequence function handles the main logic. It takes two strings, str and sub, where str is the main string and sub is the substring to remove from str.

  • It initializes a stack to keep track of the characters and an array matchPos to manage positions within the matched substring.

  • The function iterates over each character in the main string. For each character that matches a character in sub, it updates a counter j and the matchPos array. If the characters do not match and j is not zero, the function rolls back the index to recheck with the previous longest prefix that also suffixes without the character.

  • If j equals the length of sub, indicating a full match, the function removes the characters corresponding to sub from the stack. The removal is continuous for successive and overlapped occurrences of sub.

  • Once the entire string is processed, characters remaining in the stack are popped to form the resultant string without the occurrences of sub.

  • The buildLPS function is a helper used to precompute the longest prefix which is also a suffix (LPS) array for substring sub, which is crucial for the KMP algorithm to function properly. This precomputation helps in jumping back to the longest matching position when a mismatch happens during the scanning of str.

The implementation is efficient in terms of space and time complexity, enabling the function to handle large strings and substrings effectively. Consider using this approach when the requirement is to remove repeatedly overlapping substrings from a text efficiently.

java
class Manipulator {

    public String deleteSubstrings(String mainStr, String subStr) {
        int[] lps = kmpPreprocess(subStr);
        Stack<Character> stack = new Stack<>();
        int[] matchedIndices = new int[mainStr.length() + 1];

        for (int i = 0, j = 0; i < mainStr.length(); i++) {
            char c = mainStr.charAt(i);
            stack.push(c);

            if (c == subStr.charAt(j)) {
                matchedIndices[stack.size()] = ++j;

                if (j == subStr.length()) {
                    for (int k = 0; k < subStr.length(); k++) {
                        stack.pop();
                    }
                    j = stack.isEmpty() ? 0 : matchedIndices[stack.size()];
                }
            } else {
                if (j != 0) {
                    i--;
                    j = lps[j - 1];
                    stack.pop();
                } else {
                    matchedIndices[stack.size()] = 0;
                }
            }
        }

        StringBuilder result = new StringBuilder();
        while (!stack.isEmpty()) {
            result.append(stack.pop());
        }
        return result.reverse().toString();
    }

    private int[] kmpPreprocess(String pattern) {
        int[] lps = new int[pattern.length()];
        for (int i = 1, len = 0; i < pattern.length();) {
            if (pattern.charAt(i) == pattern.charAt(len)) {
                lps[i++] = ++len;
            } else if (len != 0) {
                len = lps[len - 1];
            } else { 
                lps[i++] = 0;
            }
        }
        return lps;
    }
}

The given Java solution provides a method to remove all occurrences of a specific substring from a string using the Knuth-Morris-Pratt (KMP) algorithm for pattern matching. Here's a breakdown of how the solution works:

  • KMP Preprocessing: The kmpPreprocess method is used to compute the longest prefix which is also suffix (LPS) array for the substring. This array is crucial for efficient backtracking in the KMP algorithm, allowing the algorithm to skip unnecessary comparisons.

  • Main Processing Logic:

    • A Stack<Character> is used to manage the characters of the main string as they are processed.
    • An array matchedIndices keeps track of the indices to which each character of the substring has been matched.
    • For every character in the main string, it is pushed onto the stack and checked against the current character of the substring.
    • If a character matches the corresponding character in the substring, the match index is updated.
    • If a full match of the substring is found (j == subStr.length()), characters are popped from the stack to effectively remove the substring from the main string, and the process continues with the next characters.
    • If a mismatch happens after some matches, the algorithm uses the LPS array to backtrack to the previous possible match without unnecessary comparison or removing characters from the stack.
  • Post-Processing: After processing all characters, the stack might still contain the characters of the main string that do not form the substring. These are then popped from the stack and appended to a StringBuilder. The resulting string, which has all occurrences of the substring removed, is obtained by reversing the StringBuilder content.

This method ensures that all occurrences of the substring are efficiently removed from the main string, even if the substring overlaps with itself or appears multiple times in quick succession. Each character of the main string is processed in a linear pass, making the approach efficient in terms of time complexity, leveraging the KMP algorithm's preprocessing steps. The use of stack and the LPS array enable handling complex cases of substring removal in an optimal manner.

python
class Solution:
    def deleteSubsequence(self, string: str, subsequence: str) -> str:
        # Compute the KMP table for the subsequence
        kmp_table = self._create_kmp(subsequence)

        # Use a list to act as a stack to manage characters
        character_stack = []

        # Track positions of subsequence matches
        match_positions = [0] * (len(string) + 1)

        idx = 0
        subseq_idx = 0
        while idx < len(string):
            char = string[idx]
            character_stack.append(char)

            if char == subsequence[subseq_idx]:
                # Monitor the position for current match count
                match_positions[len(character_stack)] = subseq_idx + 1
                subseq_idx += 1

                # Check if the subsequence is fully matched
                if subseq_idx == len(subsequence):
                    # Remove the matched subsequence from stack
                    for _ in range(len(subsequence)):
                        character_stack.pop()

                    # Reset the matching index based on remaining characters in stack
                    subseq_idx = (
                        0
                        if not character_stack
                        else match_positions[len(character_stack)]
                    )
            else:
                # When characters mismatch
                if subseq_idx != 0:
                    # Utilize the KMP table to skip unnecessary checks
                    idx -= 1
                    subseq_idx = kmp_table[subseq_idx - 1]
                    character_stack.pop()
                else:
                    # Reset match position when mismatch at the first character
                    match_positions[len(character_stack)] = 0

            idx += 1

        # Build the result from the stack content
        return "".join(character_stack)

    def _create_kmp(self, seq: str) -> list:
        # List for the KMP "partial match" table
        lps = [0] * len(seq)

        # Variables to track the position and the length of current prefix
        length = 0
        i = 1
        while i < len(seq):
            if seq[i] == seq[length]:
                length += 1
                lps[i] = length
                i += 1
            else:
                if length != 0:
                    # Use previously computed lps value for optimization
                    length = lps[length - 1]
                else:
                    lps[i] = 0
                    i += 1

        return lps

Here's a solution summary for removing all occurrences of a substring from a given string using Python:

  1. Define a class Solution with the method deleteSubsequence that accepts a string and a subsequence to be removed.
  2. Utilize a KMP (Knuth-Morris-Pratt) algorithm to efficiently search for the substring within the string. Start by creating a KMP table with a helper function _create_kmp.
  3. Use a list as a character stack to build the result of the string after all occurrences of the subsequence are removed.
  4. Initialize a list match_positions to track the positions of the subsequence matches while iterating through the string.
  5. Iterate through each character in string, pushing characters to the character_stack and updating match_positions based on matches with the subsequence.
  6. If the subsequence is found (when the length of the current match equals the length of the subsequence), reset the subseq_idx based on remaining characters in character_stack and remove the matched subsequence from the stack.
  7. If a mismatch is detected and it's not at the start, adjust the iterator based on the KMP table to skip unnecessary checks.
  8. Continue iterating until all characters of string have been processed.
  9. Finally, construct and return the result by joining the characters left in the character_stack.

This approach efficiently removes all occurrences of a subsequence by using the KMP string matching algorithm to avoid redundant checks and optimize the search within the main string.

Comments

No comments yet.