
Problem Statement
The problem requires finding a substring within the given string s
that meets the following conditions:
- The substring has the maximum possible length.
- It contains at most
k
distinct characters.
This task demands identifying the longest contiguous block of characters in s
such that the diversity (in terms of different characters) in that block does not exceed k
.
Examples
Example 1
Input:
s = "eceba", k = 2
Output:
3
Explanation:
The substring is "ece" with length 3.
Example 2
Input:
s = "aa", k = 1
Output:
2
Explanation:
The substring is "aa" with length 2.
Constraints
1 <= s.length <= 5 * 104
0 <= k <= 50
Approach and Intuition
Based on the examples and constraints, the optimal approach involves using a sliding window technique:
Initialize Pointers and Data Structure: A left pointer (
left
) and right pointer (right
) are initialized at the start of the string. Additionally, a hashmap or dictionary is used to count characters within the window bounded byleft
andright
.Expand the Window: The
right
pointer moves to the right, thereby expanding the window and including new characters till the condition of having at mostk
distinct characters is violated.Contract the Window: Once the number of distinct characters exceeds
k
, increment theleft
pointer to reduce the window size until the condition of at mostk
distinct characters is satisfied again.Update Maximum Length: Throughout this process, continuously update the length of the largest valid (at most
k
distinct characters) window.Edge Handling: Special consideration is required for
k = 0
, which would mean the string could not have any distinct character. The implication is often that only zero-length substrings can be considered, typically leading to a return value of zero or similar based on exact problem constraints. However, depending on problem specifications, this condition might also imply that the task is inconsequential or needs reformatting since asking for a substring with zero distinct characters from a non-empty strings
could be a conflicting instruction.
Considering the constraints:
- The length of
s
can go up to 50,000 characters, making the algorithm efficiency critical to handle larger inputs effectively. - Given
k <= 50
, the diversity in the substring can't be substantial relative to large possible string lengths, focusing the problem more on efficient scanning and less on handling a massive variety of characters.
Solutions
- Java
- Python
class Solution {
public int longestSubstringWithKDistinct(String str, int k) {
int length = str.length();
int maxLen = 0;
Map<Character, Integer> charMap = new HashMap<>();
for (int i = 0; i < length; i++) {
charMap.put(str.charAt(i), charMap.getOrDefault(str.charAt(i), 0) + 1);
if (charMap.size() <= k) {
maxLen++;
} else {
charMap.put(str.charAt(i - maxLen), charMap.get(str.charAt(i - maxLen)) - 1);
if (charMap.get(str.charAt(i - maxLen)) == 0) {
charMap.remove(str.charAt(i - maxLen));
}
}
}
return maxLen;
}
}
The provided Java code defines a method longestSubstringWithKDistinct
to solve the problem of finding the length of the longest substring that contains at most k
distinct characters from a given string. The solution employs a sliding window technique and a hash map to keep track of the frequency of characters within the window.
Here's a breakdown of the procedure:
- Initialize
length
with the length of the input string andmaxLen
to zero, which will store the maximum length of the substring encountered. - Next, create a
HashMap
namedcharMap
to count occurrences of each character in the current window of the string. - Loop through the string using an index
i
. For each character:- Add it to the
charMap
or increment its count if it already exists. - Check if the current window contains at most
k
distinct characters. If yes, incrementmaxLen
. - If the window has more than
k
distinct characters, adjust the window from the left by reducing the count of the character at the start of the window, or removing it entirely if its count drops to zero.
- Add it to the
- After processing all characters, return
maxLen
as the result, which is the size of the longest substring with no more thank
distinct characters.
This method ensures that each character of the string is only processed a constant number of times, resulting in a time complexity that is effectively linear relative to the length of the input string.
class Solution:
def longestDistinctSubstr(self, input_str: str, max_distinct: int) -> int:
longest_len = 0
char_count = collections.Counter()
for end in range(len(input_str)):
char_count[input_str[end]] += 1
if len(char_count) <= max_distinct:
longest_len += 1
else:
char_count[input_str[end - longest_len]] -= 1
if char_count[input_str[end - longest_len]] == 0:
del char_count[input_str[end - longest_len]]
return longest_len
The solution provided is a Python function, longestDistinctSubstr
, designed to find the length of the longest substring that contains at most k
distinct characters in a given string. The function takes two parameters: input_str
, the string to analyze, and max_distinct
, the maximum number of distinct characters allowed in the substring.
- The function initiates by declaring
longest_len
to zero. This variable will store the maximum length found for complying substrings. char_count
is aCounter
from Python'scollections
module, used for tracking the count of each character within the current window (substring) being considered.
The algorithm employs a sliding window technique:
- It iterates over each character in the input string using a for loop, adjusting the
char_count
for each character as it goes. - As it traverses the string, it checks if the number of distinct characters, represented by the length of
char_count
, is less than or equal tomax_distinct
.- If true, it increments
longest_len
by one, effectively extending the current window's length that meets the condition.
- If true, it increments
- When the number of distinct characters exceeds
max_distinct
, the function reduces the count of the character at the window's start and then removes this character fromchar_count
if its count hits zero. This is critical for accurately maintaining the size of the window.
The algorithm efficiently adjusts the size of the window on each iteration, ensuring that it always contains at most k
distinct characters. At the completion of the loop, it returns the length of the longest valid substring found, longest_len
. This solution leverages the sliding window strategy combined with dynamic hashmap updates to achieve an optimal performance for this problem.
No comments yet.