String Compression

Updated on 04 July, 2025
String Compression header image

Problem Statement

The task is about compressing an array of characters, referred to as chars, using a predefined algorithm that focuses on consecutive repeating characters in the input. Each group of similar consecutive characters will either:

  • Remain unchanged if it contains a single character.
  • Be represented by the character followed by the number of its occurrences if the length of the group is greater than one.

The crucial aspect of this task is to directly modify the input character array chars to represent the compressed format and then to return the length of this modified array. Specifically, for sequences with counts of 10 or more, the count is divided into individual digits that stand as separate array elements. The algorithm stipulates using only constant additional space, making in-place modification necessary.

Examples

Example 1

Input:

chars = ["a","a","b","b","c","c","c"]

Output:

Return 6, and the first 6 characters of the input array should be: ["a","2","b","2","c","3"]

Explanation:

The groups are "aa", "bb", and "ccc". This compresses to "a2b2c3".

Example 2

Input:

chars = ["a"]

Output:

Return 1, and the first character of the input array should be: ["a"]

Explanation:

The only group is "a", which remains uncompressed since it's a single character.

Example 3

Input:

chars = ["a","b","b","b","b","b","b","b","b","b","b","b","b"]

Output:

Return 4, and the first 4 characters of the input array should be: ["a","b","1","2"].

Explanation:

The groups are "a" and "bbbbbbbbbbbb". This compresses to "ab12".

Constraints

  • 1 <= chars.length <= 2000
  • chars[i] is a lowercase English letter, uppercase English letter, digit, or symbol.

Approach and Intuition

To achieve the required compression as explained from the given examples, follow this systematic approach:

  1. Initialization:

    • Start with an index write to keep track of where the next character or count should be placed in chars.
    • Use another index to traverse the characters of the array.
  2. Traversing the Input Characters:

    • Traverse through the array using a loop.
    • For each character, check how many times it repeats consecutively.
  3. Compressing Consecutive Characters:

    • If a character appears only once, place it directly in the array at index write, and increment write.
    • If a character repeats, write the character first, then convert the count of repetitions to its string representation and place each digit consecutively in the array.
  4. Consider Count of Character Sequences:

    • For groups whose count is 10 or more, ensure each digit of the count is treated as a separate character in the array.
    • This might require iterating over the digits of the count and placing them successively.
  5. Ending the Compression:

    • Continue this pattern until all characters in the array have been processed.
    • Once the loop completes, write will denote the index representing the length of the compressed character sequence within the chars array.

By following these steps, the algorithm ensures that no additional data structures are used other than the input array, adhering to the constant space constraint. The final output is the length up to which the input array holds the compressed data, which is efficiently managed by the write index.

Solutions

  • C++
cpp
class Solution {
public:
    int modifyString(vector<char>& input) {
        int index = 0, updatedIndex = 0;
        while (index < input.size()) {
            int length = 1;
            while (index + length < input.size() && input[index + length] == input[index]) {
                length++;
            }
            input[updatedIndex++] = input[index];
            if (length > 1) {
                for (char digit : to_string(length)) {
                    input[updatedIndex++] = digit;
                }
            }
            index += length;
        }
        return updatedIndex;
    }
};

Solution Summary: String Compression in C++

In this string compression task, the goal is to compress a sequence of characters by reducing sequences of the same character to the character followed by the count of repetitions if it is greater than one. The implementation employs a simple while loop strategy to iterate over the input characters and compress them in place, making the process efficient and quick.

  • Start by initializing two indices, index used for reading the original characters and updatedIndex used for writing the compressed characters.
  • Use a while loop to check each character in the input vector.
  • For each character, initialize a length variable to keep track of how many times the character is consecutively repeated.
  • Inner loop checks for repeating characters and increments the length.
  • Place the character being processed at the updatedIndex position. Increment updatedIndex.
  • If the length of repetition is greater than one (i.e., the character is repeated), convert the length to a string and loop through each digit, placing each digit into the input vector at the updated index.
  • Increment both index by the length (to move to the next new character) and updatedIndex as required after inserting characters or lengths.
  • After processing all characters, return updatedIndex, which is the new length of the compressed character array.

This approach modifies the input vector in place, eliminating the need for extra space (beyond minimal use for the length-to-string conversion), making it efficient in terms of both time and space.

  • Java
java
class Solution {
    public int compressChars(char[] chars) {
        int index = 0, resultPos = 0;
        while (index < chars.length) {
            int count = 1;
            while (index + count < chars.length && chars[index + count] == chars[index]) {
                count++;
            }
            chars[resultPos++] = chars[index];
            if (count > 1) {
                for (char digit : Integer.toString(count).toCharArray()) {
                    chars[resultPos++] = digit;
                }
            }
            index += count;
        }
        return resultPos;
    }
}

This Java solution compresses strings by grouping consecutive identical characters in an array and replacing them with the character followed by its count. This approach uses in-place modification for efficiency.

  • Start by initializing index to 0 to iterate through the input character array, and resultPos to 0 to track the position in the array where the compressed characters should be inserted.
  • Iterate over the array using a while loop until index is less than the length of the character array.
  • For each character at the current index, count consecutive duplicates by initializing a count variable and using a nested while loop. This nested loop increments count as long as the next character is the same as the current.
  • Place the character into the array at the current resultPos, then increment resultPos.
  • If count is greater than one, convert count to its character array and append each digit to chars at subsequent resultPos, and then increment resultPos accordingly.
  • Increment index by count to skip the compressed characters.
  • Return resultPos, which represents the length of the compressed array.

This method efficiently compresses the array without needing additional space, thereby modifying the original array directly based on the count of consecutive identical characters.

  • Python
python
class Solution:
    def compress_string(self, characters: List[str]) -> int:
        index = 0
        output_pos = 0
        while index < len(characters):
            count = 1
            while (index + count < len(characters) and characters[index + count] == characters[index]):
                count += 1
            characters[output_pos] = characters[index]
            output_pos += 1
            if count > 1:
                count_str = str(count)
                characters[output_pos:output_pos+len(count_str)] = list(count_str)
                output_pos += len(count_str)
            index += count
        return output_pos

This Python solution provides an effective way to compress a string array "characters" by modifying the array in-place and then returning the new length of the modified array.

  • Begin by initializing an index to track the current position in the characters array and an output_pos to indicate where to write the compressed characters.
  • Use a while loop to process each character until the end of the list:
    1. Initialize count to 1.
    2. Use another while loop to count contiguous identical characters.
    3. Store the current character in the compressed output position and increment output_pos.
    4. If count is greater than 1, meaning the character repeats:
      • Convert the count to a string and expand this into the list starting from output_pos.
      • Adjust output_pos by the length of the count string.
    5. Increment the index by count to skip over the already processed characters.
  • At the end, output_pos gives the length of the array after compression, which is returned.

This method effectively reduces the space used whenever there are repeated characters, making it a compact and efficient solution for string compression tasks.

Comments

No comments yet.