String Compression II

Updated on 11 July, 2025
String Compression II header image

Problem Statement

Run-length encoding (RLE) is a method to compress strings by replacing sequences of repeated characters with the character followed by the number indicating the count of consecutive repetitions. For example, the string "aabccc" would be compressed to "a2bc3": the two 'a's are replaced by "a2" and the three 'c's by "c3". Single characters in the sequence, however, do not follow with the number '1' in this compression scheme.

In this problem, you are given a string s and an integer k, representing the maximum number of characters you can remove from s. The goal is to determine the minimum possible length of the run-length encoded version of s after removing up to k characters. This involves strategically deleting characters from s to minimize the length of the encoded string.

Examples

Example 1

Input:

s = "aaabcccd", k = 2

Output:

4

Explanation:

Compressing s without deleting anything will give us "a3bc3d" of length 6. Deleting any of the characters 'a' or 'c' would at most decrease the length to 5, for instance delete 2 'a's → s = "abcccd" → "abc3d". The optimal strategy is to delete 'b' and 'd', resulting in "a3c3" of length 4.

Example 2

Input:

s = "aabbaa", k = 2

Output:

2

Explanation:

If we delete both 'b' characters, the resulting string is "aaaa" → compressed to "a4" of length 2.

Example 3

Input:

s = "aaaaaaaaaaa", k = 0

Output:

3

Explanation:

Since k is zero, no deletions are allowed. The compressed string is "a11" of length 3.

Constraints

  • 1 <= s.length <= 100
  • 0 <= k <= s.length
  • s contains only lowercase English letters.

Approach and Intuition

  1. Understanding Run-length Encoding RLE reduces length effectively for longer runs of repeated characters. The more continuous the repetitions, the shorter the resulting encoded string.

  2. Impact of Removals Removing disruptive characters (those that break potential long runs) can merge segments and reduce RLE length. Conversely, unnecessary removals may not help and can even worsen the encoding.

  3. Dynamic Programming Optimization Let dp[i][k] represent the minimum compressed length of the suffix starting at index i with k deletions left. We recursively compute this value by choosing to either:

    • Remove the current character (reduce k, continue at i+1)
    • Keep the character, expand a run, and try all possible end positions of this run.
  4. Compression Length Function When forming a run of characters, use a helper to compute the RLE length contribution:

    • "a" → 1
    • "a2" → 2
    • "a10" → 3 Based on the number of digits in the count.
  5. Edge Case Considerations

    • When k = 0, no deletions are allowed—compress the entire string as-is.
    • If the string is already composed of long runs, minimal or no deletions may be optimal.

By applying dynamic programming with efficient memoization, you can determine the optimal characters to remove to minimize the final run-length encoded output.

Solutions

  • Java
java
public class Solution {
    private Map<Integer, Integer> cache = new HashMap<>();
    private Set<Integer> addCount = Set.of(1, 9, 99);
    
    public int optimalCompressionLength(String text, int k) {
        return calcDP(text, 0, (char) ('a' + 26), 0, k);
    }
    
    private int calcDP(String text, int index, char previousChar, int previousCount, int k) {
        if (k < 0) {
            return Integer.MAX_VALUE / 2;
        }
    
        if (index == text.length()) {
            return 0;
        }
    
        int cacheKey = index * 101 * 27 * 101 + (previousChar - 'a') * 101 * 101 + previousCount * 101 + k;
    
        if (cache.containsKey(cacheKey)) {
            return cache.get(cacheKey);
        }
    
        int keepCharacter;
        int eliminateCharacter = calcDP(text, index + 1, previousChar, previousCount, k - 1);
        if (text.charAt(index) == previousChar) {
            keepCharacter = calcDP(text, index + 1, previousChar, previousCount + 1, k) + (addCount.contains(previousCount) ? 1 : 0);
        } else {
            keepCharacter = calcDP(text, index + 1, text.charAt(index), 1, k) + 1;
        }
        int result = Math.min(keepCharacter, eliminateCharacter);
        cache.put(cacheKey, result);
    
        return result;
    }
}

The Java solution for the "String Compression II" problem uses dynamic programming with memoization to optimize the process of string compression under a given limit of character deletions ( k ). The main objective is to minimize the length of the compressed version of the string by strategically eliminating up to ( k ) characters.

The approach involves the following elements:

  • Memoization is implemented using a HashMap called cache to store results of subproblems and avoid redundant calculations. This significantly reduces the execution time by remembering the results of previously computed scenarios.

  • Recursive Function (calcDP): This function computes the optimal compressed length of a substring starting from a given index, with constraints defined by the previous character, its count in the current compression block, and the remaining deletions allowed ( k ).

  • Parameters of calcDP:

    • text: The input text to be compressed.
    • index: The current position in the string.
    • previousChar: The character of the current compression block.
    • previousCount: The count of this character in the current block.
    • k: The remaining deletions allowed.
  • Decision Points within calcDP:

    • If the current character is the same as previousChar, either keep it and potentially increase the count or discard it to use one of the allowed deletions.
    • If it’s different, start a new compression block or decide to remove the character.
  • Performance Optimizations:

    • Compression Length Calculation: Adjusts the added length depending on certain thresholds (1, 9, 99), which reflects how integer lengths change in string formation (e.g., going from "9" to "10" increases the digit count).
    • Cache Key Creation: A unique key for the cache is constructed using a combination of current index, previous character, count, and remaining deletions. This ensures that identical scenarios across function calls do not require re-computation.
  • Base and Edge Cases:

    • If ( k ) becomes negative, the function returns a very large number, indicating an invalid scenario.
    • At the end of the string (when ( index ) equals the length of text), it returns 0 as no more characters are left to process.

This method efficiently calculates the minimum compressed length considering deletions, utilizing both character continuation and strategic eliminations guided by recursive traversal of string choices and memoization of intermediate results.

  • Python
python
class Solution:
    def minimizeLength(self, string: str, deletions: int) -> int:
        @lru_cache(None)
        def compress(index, curr_char, curr_count, deletions_left):
            if deletions_left < 0:
                return float('inf')
            if index == length:
                return 0
    
            remove_this_char = compress(index + 1, curr_char, curr_count, deletions_left - 1)
            if string[index] == curr_char:
                keep_this_char = compress(index + 1, curr_char, curr_count + 1, deletions_left) + (curr_count in [1, 9, 99])
            else:
                keep_this_char = compress(index + 1, string[index], 1, deletions_left) + 1
    
            return min(keep_this_char, remove_this_char)
    
        length = len(string)
        return compress(0, "", 0, deletions)

The given Python solution addresses the problem of compressing a string to a minimal length by potentially deleting up to a specified number of characters. The solution employs dynamic programming with memoization (lru_cache) for efficiency.

Here's an explanation on how the function minimizeLength works:

  • Memoization Setup: The function compress is decorated with lru_cache(None) to cache the results of computations and speed up the process by avoiding redundant calculations.

  • Base Conditions:

    • If deletions_left is less than zero, it indicates that more characters have been removed than allowed, so this path in the recursion returns a very high value (infinity).
    • If index reaches the end of the string (length), the recursion terminates as there are no more characters to process, returning 0.
  • Recursive Decisions:

    • Removing Current Character: It recursively considers skipping the current character by reducing deletions_left and moving to the next index.
    • Keeping Current Character: If the current character matches curr_char, it increments curr_count and moves to the next character without changing the allowed deletions count. If it’s a new character, it resets curr_char and sets curr_count to 1, also adding to the length if this character initates a new count group (1, 9, 99 corresponding to length increases from additional digits).
  • Choosing the Optimal Path: It returns the minimum of either keeping or removing the current character, thus ensuring the minimal length is achieved.

  • Computation: Starts the compression from the first character with no deletions occurred and no characters initially counted.

This approach ensures that the function efficiently finds the minimal length of the string after allowed deletions while considering different scenarios to compress contiguous characters efficiently. The use of a dynamic programming strategy makes the solution suitable for longer strings and higher numbers of deletions, leveraging caching to reduce time complexity markedly.

Comments

No comments yet.