Longest Substring with At Most K Distinct Characters

Updated on 06 June, 2025
Longest Substring with At Most K Distinct Characters header image

Problem Statement

The problem requires finding a substring within the given string s that meets the following conditions:

  • The substring has the maximum possible length.
  • It contains at most k distinct characters.

This task demands identifying the longest contiguous block of characters in s such that the diversity (in terms of different characters) in that block does not exceed k.

Examples

Example 1

Input:

s = "eceba", k = 2

Output:

3

Explanation:

The substring is "ece" with length 3.

Example 2

Input:

s = "aa", k = 1

Output:

2

Explanation:

The substring is "aa" with length 2.

Constraints

  • 1 <= s.length <= 5 * 104
  • 0 <= k <= 50

Approach and Intuition

Based on the examples and constraints, the optimal approach involves using a sliding window technique:

  1. Initialize Pointers and Data Structure: A left pointer (left) and right pointer (right) are initialized at the start of the string. Additionally, a hashmap or dictionary is used to count characters within the window bounded by left and right.

  2. Expand the Window: The right pointer moves to the right, thereby expanding the window and including new characters till the condition of having at most k distinct characters is violated.

  3. Contract the Window: Once the number of distinct characters exceeds k, increment the left pointer to reduce the window size until the condition of at most k distinct characters is satisfied again.

  4. Update Maximum Length: Throughout this process, continuously update the length of the largest valid (at most k distinct characters) window.

  5. Edge Handling: Special consideration is required for k = 0, which would mean the string could not have any distinct character. The implication is often that only zero-length substrings can be considered, typically leading to a return value of zero or similar based on exact problem constraints. However, depending on problem specifications, this condition might also imply that the task is inconsequential or needs reformatting since asking for a substring with zero distinct characters from a non-empty string s could be a conflicting instruction.

Considering the constraints:

  • The length of s can go up to 50,000 characters, making the algorithm efficiency critical to handle larger inputs effectively.
  • Given k <= 50, the diversity in the substring can't be substantial relative to large possible string lengths, focusing the problem more on efficient scanning and less on handling a massive variety of characters.

Solutions

  • Java
  • Python
java
class Solution {
    public int longestSubstringWithKDistinct(String str, int k) {
        int length = str.length();
        int maxLen = 0;
        Map<Character, Integer> charMap = new HashMap<>();
        
        for (int i = 0; i < length; i++) {
            charMap.put(str.charAt(i), charMap.getOrDefault(str.charAt(i), 0) + 1);
            
            if (charMap.size() <= k) {
                maxLen++;
            } else {
                charMap.put(str.charAt(i - maxLen), charMap.get(str.charAt(i - maxLen)) - 1);
                if (charMap.get(str.charAt(i - maxLen)) == 0) {
                    charMap.remove(str.charAt(i - maxLen));
                }
            }
        }

        return maxLen; 
    }
}

The provided Java code defines a method longestSubstringWithKDistinct to solve the problem of finding the length of the longest substring that contains at most k distinct characters from a given string. The solution employs a sliding window technique and a hash map to keep track of the frequency of characters within the window.

Here's a breakdown of the procedure:

  • Initialize length with the length of the input string and maxLen to zero, which will store the maximum length of the substring encountered.
  • Next, create a HashMap named charMap to count occurrences of each character in the current window of the string.
  • Loop through the string using an index i. For each character:
    • Add it to the charMap or increment its count if it already exists.
    • Check if the current window contains at most k distinct characters. If yes, increment maxLen.
    • If the window has more than k distinct characters, adjust the window from the left by reducing the count of the character at the start of the window, or removing it entirely if its count drops to zero.
  • After processing all characters, return maxLen as the result, which is the size of the longest substring with no more than k distinct characters.

This method ensures that each character of the string is only processed a constant number of times, resulting in a time complexity that is effectively linear relative to the length of the input string.

python
class Solution:
    def longestDistinctSubstr(self, input_str: str, max_distinct: int) -> int:
        longest_len = 0
        char_count = collections.Counter()
        
        for end in range(len(input_str)):
            char_count[input_str[end]] += 1
            
            if len(char_count) <= max_distinct:
                longest_len += 1
            else:
                char_count[input_str[end - longest_len]] -= 1
                if char_count[input_str[end - longest_len]] == 0:
                    del char_count[input_str[end - longest_len]]

        return longest_len

The solution provided is a Python function, longestDistinctSubstr, designed to find the length of the longest substring that contains at most k distinct characters in a given string. The function takes two parameters: input_str, the string to analyze, and max_distinct, the maximum number of distinct characters allowed in the substring.

  • The function initiates by declaring longest_len to zero. This variable will store the maximum length found for complying substrings.
  • char_count is a Counter from Python's collections module, used for tracking the count of each character within the current window (substring) being considered.

The algorithm employs a sliding window technique:

  • It iterates over each character in the input string using a for loop, adjusting the char_count for each character as it goes.
  • As it traverses the string, it checks if the number of distinct characters, represented by the length of char_count, is less than or equal to max_distinct.
    • If true, it increments longest_len by one, effectively extending the current window's length that meets the condition.
  • When the number of distinct characters exceeds max_distinct, the function reduces the count of the character at the window's start and then removes this character from char_count if its count hits zero. This is critical for accurately maintaining the size of the window.

The algorithm efficiently adjusts the size of the window on each iteration, ensuring that it always contains at most k distinct characters. At the completion of the loop, it returns the length of the longest valid substring found, longest_len. This solution leverages the sliding window strategy combined with dynamic hashmap updates to achieve an optimal performance for this problem.

Comments

No comments yet.