Optimal Partition of String

Updated on 14 July, 2025
Optimal Partition of String header image

Problem Statement

The task is to take a given string s and split it into the smallest number of substrings such that each substring contains characters that are entirely unique; no character repeats within the same substring. Essentially, every character within a substring must be distinct from every other character in that same substring. The aim is to determine and return this minimum number of substrings needed.

Examples

Example 1

Input:

s = "abacaba"

Output:

4

Explanation:

Two possible partitions are ("a","ba","cab","a") and ("ab","a","ca","ba").
It can be shown that 4 is the minimum number of substrings needed.

Example 2

Input:

s = "ssssss"

Output:

6

Explanation:

The only valid partition is ("s","s","s","s","s","s").

Constraints

  • 1 <= s.length <= 10^5
  • s consists of only English lowercase letters.

Approach and Intuition

The main goal here is to segment the string s in a way that minimizes the number of contiguous substrings while ensuring that each substring contains no duplicate characters.

Steps in the Approach:

  1. Initiate a cycle through the string while maintaining a record of characters that have occurred in the current substring.
  2. Use a set data structure to check for repetitions. If a character that is about to be added is already in the set, it signifies the start of a new substring.
  3. Continue this until the entire string is processed, counting the substrings formed.
  4. Given the constraints, a linear traversal with a set ensures the solution remains efficient and scalable.

Discussion Based on Examples:

  • Example 1: "abacaba" can form substrings like ("a", "ba", "cab", "a"), which involves breaking when a character that had been accumulated in a set reappears. This is captured by creating a new substring each time a duplicate is seen in the current set of unique characters.
  • Example 2: "ssssss" demonstrates a scenario where every character causes a new substring due to repetition, resulting in 6 substrings for the 6 characters.

This approach guarantees a minimal number of unique-character substrings and operates within the performance requirements dictated by the input size constraints.

Solutions

  • C++
cpp
class Solution {
public:
    int countUniquePartitions(string str) {
        vector<int> lastPosition(26, -1);
        int partitionCount = 1, startIdx = 0;
    
        for (int index = 0; index < str.size(); index++) {
            if (lastPosition[str[index] - 'a'] >= startIdx) {
                partitionCount++;
                startIdx = index;
            }
            lastPosition[str[index] - 'a'] = index;
        }
    
        return partitionCount;
    }
};

The given C++ program provides a solution to the problem of partitioning a string into the maximum number of substrings, such that each substring contains unique characters. Here's a breakdown of how the solution works:

  • lastPosition vector keeps track of the last position of each character in the alphabet relative to the string being analyzed.
  • partitionCount starts at 1, representing at least one partition.
  • startIdx represents the starting index of the current partition.

For each character in the string:

  • If the character has appeared before and its last known position is within the boundary of the current partition (lastPosition[str[index] - 'a'] >= startIdx), then start a new partition and update startIdx to the current character's index.
  • Update the lastPosition of the current character to the current index (lastPosition[str[index] - 'a'] = index).

At the end of the loop, partitionCount will contain the number of unique partitions possible. This approach ensures that you compute the maximum number of substrings where no character repeats within each substring, effectively parsing the string for the optimal partition. The solution works efficiently with a linear pass over the string and constant space relative to the charset involved (i.e., 26 for the English alphabet).

  • Java
java
class Solution {
    public int splitString(String input) {
        int[] charIndex = new int[26];
        Arrays.fill(charIndex, -1);
        int partitions = 1, start = 0;
    
        for (int i = 0; i < input.length(); i++) {
            if (charIndex[input.charAt(i) - 'a'] >= start) {
                partitions++;
                start = i;
            }
            charIndex[input.charAt(i) - 'a'] = i;
        }
    
        return partitions;
    }
}

The provided Java solution aims to determine the optimal number of partitions a string can be divided into so that no letter appears in more than one partition. It works by tracking the last occurrence of each character and adjusting the start of a new partition whenever a character is found that would otherwise conflict with the integrity of the current partition.

Here's a concise explanation of the algorithm:

  1. Initialize an array charIndex to store the most recent index of each character encountered in the string. The array is initially filled with -1 to indicate no occurrence.
  2. Set the number of partitions to 1 as the string will have at least one partition.
  3. Iterate through every character in the input string using a for loop:
    • Calculate the array index for each character by subtracting 'a' from the character itself.
    • Check if the current character was seen after the start of the current partition (start):
      • If yes, increment the partition count (partitions).
      • Update start to the current index i.
    • Update the latest occurrence of the current character in the charIndex array.

Ultimately, return the total number of partitions (partitions) that were calculated. The logic ensures each character only appears in one partition, leading to the optimal subdivision of the string.

  • Python
python
class Solution:
    def countSubstrings(self, input_string: str) -> int:
        last_position = [-1] * 26
        partition_count = 1
        start_index = 0
    
        for index in range(len(input_string)):
            char_index = ord(input_string[index]) - ord('a')
            if last_position[char_index] >= start_index:
                partition_count += 1
                start_index = index
            last_position[char_index] = index
    
        return partition_count

The given Python script defines a solution class method named countSubstrings which aims to analyze optimal partitions of a string such that no letter appears in more than one partition. Here's a breakdown of how the method works:

  • Instantiate an array last_position with a size of 26 (representing each letter of the alphabet) initialized to -1. This array tracks the most recent index where each character was encountered.
  • Set partition_count to 1, starting as the first partition.
  • Initialize start_index at 0 to mark the starting index of the current partition.

The method then iterates through each character in the input string:

  • Calculate char_index by determining the position of the character in the alphabet using its Unicode code point offset by that of 'a'.
  • If the character has been seen after start_index, increment partition_count as this requires starting a new partition. Adjust start_index to the current character's index.
  • Update the last_position array with the current index for the character.

After iterating through the string, the method returns partition_count which reflects the number of distinct partitions needed in order to ensure that no character gets repeated across partitions.

This structured approach ensures an efficient partition strategy by keeping track of character positions dynamically, making the complexity of this solution largely depend on the length of the string, O(n).

Comments

No comments yet.