Split a String Into the Max Number of Unique Substrings

Updated on 11 July, 2025
Split a String Into the Max Number of Unique Substrings header image

Problem Statement

Given a string s, the task is to determine the maximum number of unique substrings into which the string can be divided. Each substring, when concatenated, must reproduce the original string s, implying every character of s must be used exactly once without reshuffling. It is crucial that no two substrings in the division are identical, making each a unique segment of s. Clearly defining a substring, it refers to any contiguous group of characters taken from the string. The challenge lies in splitting the string such that the number of unique segments is maximized, adhering to the mentioned criteria.

Examples

Example 1

Input:

s = "ababccc"

Output:

5

Explanation:

One way to split maximally is ['a', 'b', 'ab', 'c', 'cc'].  
Splitting like ['a', 'b', 'a', 'b', 'c', 'cc'] is not valid as you have 'a' and 'b' multiple times.

Example 2

Input:

s = "aba"

Output:

2

Explanation:

One way to split maximally is ['a', 'ba'].

Example 3

Input:

s = "aa"

Output:

1

Explanation:

It is impossible to split the string any further.

Constraints

  • 1 <= s.length <= 16
  • s contains only lower case English letters.

Approach and Intuition

To tackle the problem, a robust approach involves understanding a few key concepts and considering an efficient way to generate the maximum number of unique substrings:

  1. Initial Observation:

    • Any individual character in the string can serve as a unique substring only once, which informs us immediately about possible ways to avoid repetition.
  2. Recursive Backtracking:

    • One can use a recursive method to try and split the string at different points, and at each split, decide if the resulting substring can be considered (i.e., it hasn’t been considered before in the current recursive path).
  3. Utilization of Set for Uniqueness:

    • A set data structure can be handy to store the substrings temporarily as it helps in maintaining uniqueness (repeated substrings can be skipped if they already exist in the set).
  4. Consideration of All Possible Splits:

    • At every point in the string, if the substring formed up to that point is unique (not in our set of used substrings), one could recursively solve for the rest of the string, marking this substring as used.
  5. Greedy Placement for Maximum Count:

    • Though a greedy algorithm might suggest using the smallest possible unique substrings first, this may not always yield the maximal split; hence, all possibilities should be examined.
  6. Optimization Through Caching:

    • Given the constraints (with s.length up to 16), a recursive solution using backtracking might be effective. However, memorization or caching the results of previous computations could significantly speed up the process, especially when exploring different paths that overlap in terms of substrings being used.

By applying a combination of these strategies, one can efficiently maximize the number of unique substrings obtained from string s.

Solutions

  • C++
cpp
class Solution {
public:
    int calculateMaxUniqueSplit(string s) {
        unordered_set<string> seenSubstrings;
        int maxUniqueCount = 0;
        performBacktrack(s, 0, seenSubstrings, 0, maxUniqueCount);
        return maxUniqueCount;
    }
    
private:
    void performBacktrack(const string& s, int position, unordered_set<string>& seen,
                           int currentCount, int& maxUniqueCount) {
        if (currentCount + (s.length() - position) <= maxUniqueCount) return;
    
        if (position == s.length()) {
            maxUniqueCount = max(maxUniqueCount, currentCount);
            return;
        }
    
        for (int end = position + 1; end <= s.size(); ++end) {
            string substring = s.substr(position, end - position);
            if (seen.find(substring) == seen.end()) {
                seen.insert(substring);
                performBacktrack(s, end, seen, currentCount + 1, maxUniqueCount);
                seen.erase(substring);
            }
        }
    }
};

In the provided C++ solution, the objective is to determine the maximum number of unique substrings that a given string, s, can be split into.

  • The main method calculateMaxUniqueSplit initializes a hash set seenSubstrings to store unique substrings and an integer maxUniqueCount to keep track of the maximum count of unique substrings found.
  • It employs a helper method performBacktrack to perform the actual backtracking logic.
  • In performBacktrack, the function uses recursive calls to explore all possible ways the string can be split into substrings:
    • It checks if continuing the current path might lead to a solution better than the current best (maxUniqueCount). It returns without doing more computation if not.
    • The base case checks if the end of the string s has been reached, updating maxUniqueCount if the currentCount of unique substrings is greater.
    • For each position in the string, the function attempts to take substrings from the current position position to any forward position end. This is represented by the loop iterating from position + 1 to s.size() + 1.
    • It then checks if this new substring has not been previously encountered using the hash set. If unique, it adds the substring to the set, recursively calls itself with the new substring's end as the next position, and then upon return, removes the substring from the set to backtrack and try other possibilities.
  • Through the recursive backtracking, this solution explores all possibilities to find the optimal number of unique substrings into which the input string can be split.

The use of backtracking aids in handling the combinatorial nature of the problem effectively, by exploring all potential splits and updating the maximum count when a potentially better split configuration is found.

  • Java
java
class Solution {
    public int calculateMaxUniqueSplit(String s) {
        Set<String> uniqueParts = new HashSet<>();
        int[] maximumSplitCount = new int[1];
        explorePartitions(s, 0, uniqueParts, 0, maximumSplitCount);
        return maximumSplitCount[0];
    }
    
    private void explorePartitions(
        String s,
        int index,
        Set<String> uniqueParts,
        int currentSplit,
        int[] maximumSplitCount
    ) {
        if (currentSplit + (s.length() - index) <= maximumSplitCount[0]) return;
    
        if (index == s.length()) {
            maximumSplitCount[0] = Math.max(maximumSplitCount[0], currentSplit);
            return;
        }
    
        for (int end = index + 1; end <= s.length(); ++end) {
            String partition = s.substring(index, end);
            if (!uniqueParts.contains(partition)) {
                uniqueParts.add(partition);
                explorePartitions(s, end, uniqueParts, currentSplit + 1, maximumSplitCount);
                uniqueParts.remove(partition);
            }
        }
    }
}

The Java program provided is designed to solve the problem of splitting a given string into the maximum number of unique substrings. The calculateMaxUniqueSplit method initializes a HashSet to keep track of unique substrings and an integer array to store the count of maximum unique splits. This method then calls explorePartitions with initial parameters to find the optimal solution using recursion and backtracking.

In the explorePartitions method:

  • It employs a pruning technique to halt recursion early if the potential number of splits cannot exceed the current maximum found.
  • If the recursive exploration reaches the end of the string (index == s.length()), it updates the maximumSplitCount if currentSplit is greater than the current max.
  • It iterates over possible end points of a substring starting from the current index. For each possible substring:
    • If the substring is not already present in the uniqueParts set, it adds the substring to the set, recursively explores further partitions, and afterwards removes the substring to allow exploration of different partitions (backtracking mechanism).

This implementation efficiently explores all possible unique substrings and updates the maximum count accordingly, using a combination of recursion, backtracking, and pruning to optimize the process. Remember, the solution achieves maximal unique splits by continuously checking and maintaining a set of unique strings across varying recursive paths.

  • Python
python
class Solver:
    def maxDistinctParts(self, input_str: str) -> int:
        observed = set()
        highest_parts = [0]
        self.divide(input_str, 0, observed, 0, highest_parts)
        return highest_parts[0]
    
    def divide(
        self, input_str: str, pos: int, observed: set, parts_count: int, highest_parts: list
    ) -> None:
        if parts_count + (len(input_str) - pos) <= highest_parts[0]:
            return
        if pos == len(input_str):
            highest_parts[0] = max(highest_parts[0], parts_count)
            return
        for cut in range(pos + 1, len(input_str) + 1):
            piece = input_str[pos:cut]
            if piece not in observed:
                observed.add(piece)
                self.divide(input_str, cut, observed, parts_count + 1, highest_parts)
                observed.remove(piece)
        return

This Python solution tackles the problem of splitting a string into the maximum number of unique substrings. The provided Python 3 code defines a class Solver with two methods, maxDistinctParts and a helper method divide, which work together to determine this maximal split.

The approach uses backtracking to explore different ways to split the string, ensuring each part is unique:

  • Define a set named observed to store the unique substrings found so far during the recursion.
  • Use a list highest_parts initialized with [0] to keep track of the maximum count of unique substrings found.
  • The maxDistinctParts function initializes the recursive process by calling divide.
  • divide is a recursive function that uses a loop to try every possible split of the string starting from the pos index:
    • It checks the feasibility of finding a higher count of unique substrings before proceeding further.
    • If the end of the string is reached, it updates the maximum count if the current count (parts_count) is higher.
    • The function extracts a piece of the string and checks if it has not been used before.
    • If unique, it adds this piece to the observed set and recurses to the next position.
    • After recursion, it removes the piece from the set to backtrack and try the next possibility.

Maintain optimal performance and ensure full exploration of potential splits by employing a smart backtracking strategy integrated within the divide function.

Comments

No comments yet.