Longest Substring with At Most Two Distinct Characters

Updated on 06 June, 2025
Longest Substring with At Most Two Distinct Characters header image

Problem Statement

The task is to find the length of the longest substring within a given string s that contains no more than two distinct characters. A substring is defined as a contiguous sequence of characters within a string. This problem involves identifying such subsequences where the diversity of characters is limited to two distinct types.

Examples

Example 1

Input:

s = "eceba"

Output:

3

Explanation:

The substring is "ece" which its length is 3.

Example 2

Input:

s = "ccaabbb"

Output:

5

Explanation:

The substring is "aabbb" which its length is 5.

Constraints

  • 1 <= s.length <= 105
  • s consists of English letters.

Approach and Intuition

To solve this problem of finding the longest substring with at most two distinct characters, one efficient way is through the use of the sliding window technique combined with a hash map to track the distinct characters within the window. Here is a step-by-step breakdown of the approach:

  1. Initialize two pointers or indices, namely start and end both set to the beginning of the string. These will represent the current window of examination.

  2. Utilize a hash map (char_count) to track the frequency of each character within the current window defined by start and end.

  3. Iterate over the string by expanding the end pointer. Add each character encountered into the hash map and update its count.

  4. Whenever the size of the hash map exceeds two (which means the window now contains more than two distinct characters), increment the start pointer to shrink the window from the left. During this process, reduce the count of the character at the start position in the hash map, and if its count drops to zero, remove the character from the hash map.

  5. Throughout the iteration, maintain a variable to store the maximum size of the window when the window contains at most two distinct characters.

Example Walk-throughs

  • Example 1:

    • Input: s = "eceba"
    • Initializing start and end to 0, the substrings grow as "e", "ec", "ece". At point "ece", when end is at index 2, we have 2 distinct characters.
    • Moving end to index 3 introduces 'b', making it three distinct characters. We therefore move start to index 1 which makes the substring "ceb". Hence, then trimming more to "ce" or "eb", and proceeding further allows substrings like "eba". The longest valid substring is "ece" with length 3.
  • Example 2:

    • Input: s = "ccaabbb"
    • Start with the window "cc", then "cca", and expanding to "ccaa". At "ccaab", the window surpasses two distinct characters limit.
    • Adjusting start to drop excess characters, we then grow to get "aab" and later "aabbb". Finally, "aabbb" is the longest valid substring with length 5.

By applying a direct examination of each potential window and efficiently tracking character counts, this method provides a clear and robust solution to the problem.

Solutions

  • C++
  • Java
  • C
  • JavaScript
  • Python
cpp
class Solution {
public:
    int longestDistinctTwoCharSubstring(string s) {
        size_t len = s.size();
        if (len < 3) {
            return len;
        }

        int start = 0, end = 0;
        map<char, int> positionMap;
        int longest = 2;

        while (end < len) {
            positionMap[s[end]] = end;
            end++;

            if (positionMap.size() == 3) {
                int deleteIndex = INT_MAX;
                for (auto &entry : positionMap) {
                    deleteIndex = min(deleteIndex, entry.second);
                }
                positionMap.erase(s[deleteIndex]);
                start = deleteIndex + 1;
            }
            longest = max(longest, end - start);
        }
        return longest;
    }
};

The given C++ code provides a solution to find the length of the longest substring that contains at most two distinct characters. Here’s a breakdown of how the code efficiently accomplishes this:

  • The function longestDistinctTwoCharSubstring accepts a string s as input.
  • After initializing required variables and a map to track character positions in the substring, the code dynamically expands and contracts the substring within a single pass using two pointers strategy - start and end.
  • The variable end is incremented to expand the substring and the characters and their latest positions are stored in positionMap.
  • If the map grows to contain three distinct characters, the program identifies the character with the least recent position (the smallest index) and removes it from the map. The start pointer is then adjusted to the next index following the removed character, effectively contracting the substring.
  • Throughout this process, it continually updates the longest variable to maintain the length of the longest valid substring found.
  • This loop continues until the end of the string is reached, ensuring all possible substrings are considered.

The approach balances efficient substring expansion and contraction with dynamic tracking of character positions, ensuring optimal performance with a complexity primarily driven by the string length. This method is particularly efficient for cases where the string is significantly longer than the alphabet size used.

java
class Solution {
    public int maxSubstringLengthTwoDistinct(String str) {
        int strLen = str.length();
        if (strLen < 3) return strLen;

        int start = 0;
        int end = 0;
        HashMap<Character, Integer> charPositionMap = new HashMap<>();

        int maxLength = 2;

        while (end < strLen) {
            charPositionMap.put(str.charAt(end), end++);

            if (charPositionMap.size() == 3) {
                int minIndex = Collections.min(charPositionMap.values());
                charPositionMap.remove(str.charAt(minIndex));
                start = minIndex + 1;
            }

            maxLength = Math.max(maxLength, end - start);
        }
        return maxLength;
    }
}

This solution in Java addresses the problem of finding the length of the longest substring that contains at most two distinct characters.

  1. First, check if the length of the input string is less than three. If so, return the string length directly, as it naturally contains fewer than or equal to two distinct characters.

  2. Initialize two pointers, start and end, to manage the sliding window within the string, as well as a HashMap called charPositionMap to store the most recent positions of characters.

  3. Set an initial maxLength to 2, as the minimum length of substring when the window has two distinct characters.

  4. Use a loop to iterate through the string. For each character at the position end, store its position in charPositionMap and increment end.

  5. When charPositionMap has three distinct characters, identify the character with the minimum index in the map and remove it to maintain only two distinct characters in the map. Adjust the start pointer just past the removed character's position.

  6. Continuously update maxLength with the larger value between the current maxLength and the length of the current valid substring (from start to end).

  7. Finally, return the computed maxLength.

This approach efficiently manages the search space using a sliding window technique and a map to track character positions, ensuring that the substring always contains two or fewer distinct characters.

c
struct storage {
    char keyId;
    int value;
    UT_hash_handle hhRef;
};

int findMaxLengthTwoDistinctCharacters(char *str) {
    size_t length = strlen(str);
    if (length < 3) {
        return length;
    }

    int start = 0, end = 0;
    struct storage *storageMap = NULL, *item, *tempItem;
    int maxLength = 2;

    while (end < length) {
        HASH_FIND(hhRef, storageMap, &str[end], sizeof(char), item);
        if (item != NULL) {
            HASH_DEL(storageMap, item);
            free(item);
        }

        item = (struct storage *)malloc(sizeof(*item));
        if (item == NULL) return -1;

        item->keyId = str[end];
        item->value = end;
        HASH_ADD_KEYPTR(hhRef, storageMap, &item->keyId, sizeof(char), item);
        end++;

        if (HASH_COUNT(storageMap) == 3) {
            int minIdx = INT_MAX;
            HASH_ITER(hhRef, storageMap, item, tempItem) {
                minIdx = minIdx < item->value ? minIdx : item->value;
            }

            HASH_FIND(hhRef, storageMap, &str[minIdx], sizeof(char), item);
            if (item != NULL) {
                HASH_DEL(storageMap, item);
                free(item);
            }
            start = minIdx + 1;
        }
        maxLength = maxLength > (end - start) ? maxLength : (end - start);
    }
    return maxLength;
}

This solution addresses the problem of finding the length of the longest substring with at most two distinct characters from a given string. The implementation, written in C, uses a combination of a hashing technique and a sliding window approach.

  • A struct storage is defined to store each character (keyId) and its latest position (value) in the string. UT_hash_handle is utilized to enable hashing of the struct.
  • The function findMaxLengthTwoDistinctCharacters receives a pointer to the character string. It initially checks if the string length is less than 3, immediately returning the string's length since the result is trivial in such cases.
  • A sliding window is utilized with two pointers, start and end, initially at the start of the string. storageMap is introduced to keep track of up to two unique characters and their last occurrences.
  • As end increases, it examines each character until the end of the string:
    • If the character already exists in storageMap, its entry gets updated.
    • For each new character, it creates a new struct storage, assigns the character and its current index, and adds it to the storageMap.
    • The sliding window adjusts when storageMap exceeds two distinct characters by removing the character with the smallest index and moving the start pointer forward.
  • The length of the current valid substring (end - start) is compared against maxLength to keep track of the maximum found so far.
  • When the traversal is complete, the function returns maxLength, representing the length of the longest substring with at most two distinct characters.

This implementation effectively manages the sliding window and character index with dynamic memory management and hashing structures, providing an efficient solution to the problem.

js
var maxSubstringTwoDistinctChars = function (str) {
    let len = str.length;
    if (len < 3) return len;

    let start = 0;
    let end = 0;
    let charPositions = {};

    let max_length = 2;

    while (end < len) {
        charPositions[str[end]] = end++;

        if (Object.keys(charPositions).length === 3) {
            let minIndex = Math.min(...Object.values(charPositions));
            delete charPositions[str[minIndex]];
            start = minIndex + 1;
        }

        max_length = Math.max(max_length, end - start);
    }
    return max_length;
};

This JavaScript solution addresses the problem of finding the longest substring that contains at most two distinct characters in a given string.

The function maxSubstringTwoDistinctChars starts by checking if the string length is less than three. If it is, the function directly returns the string length, as any string shorter than three inevitably meets the criteria.

The core logic involves two pointers, start and end, which initially point to the beginning of the string. The charPositions object keeps track of the last occurrence index of each character encountered as end traverses through the string.

  • Initialize max_length to 2, since the minimum meaningful output when two distinct characters are present is a substring of length 2.
  • Use a while loop to iterate through the string. For each character at position end, update its latest occurrence in charPositions and increment end.
  • Check if the number of distinct characters, determined by the key count in charPositions, exceeds two. If it does, calculate the minimum index from the current character positions to find the oldest character position, remove this character from charPositions, and update start to focus only on the substring from the next index of this oldest character.
  • Continuously update max_length with the maximum between its current value and the length of the current substring defined by start to end.

At the end of the iteration, max_length contains the length of the longest substring composed of at most two distinct characters. This is returned as the final result, presenting an efficient approach with a time complexity of O(n), where n is the length of the input string.

python
from collections import defaultdict

class Solution:
    def longestSubstringTwoDistinctChars(self, s: str) -> int:
        length = len(s)
        if length < 3:
            return length

        # Initialize window pointers and character map
        start, end = 0, 0
        char_map = defaultdict()

        longest = 2

        while end < length:
            # Update the current character's last seen index
            char_map[s[end]] = end
            end += 1

            # If the map contains 3 distinct characters, shrink the window
            if len(char_map) == 3:
                # Find the index of the oldest character and remove it
                oldest = min(char_map.values())
                del char_map[s[oldest]]
                # Adjust the start of the window
                start = oldest + 1

            # Calculate the maximum length encountered
            longest = max(longest, end - start)

        return longest

The Python program provided tackles the challenge of finding the length of the longest substring that contains at most two distinct characters from a given string s. Follow these steps to understand the process:

  1. Check if the length of string s is less than 3. Return this length as any substring will have at most two distinct characters in this case.
  2. Initialize two pointers, start and end, to manage the sliding window and a dictionary char_map to track the characters and their indices within the window.
  3. Set the initial value of longest to 2, which will store the maximum length of substrings found that meet the criteria.
  4. Utilize a while loop to iterate as long as end is less than the string length. Within the loop:
    • Map the current character at position end to its index.
    • Increment the end pointer to slide the window to the right.
    • If char_map contains three distinct characters, identify the index of the earliest occurring character (oldest) and remove it from the char_map. Then, adjust the start pointer to the position right after this oldest character.
    • Update longest with the larger value between longest and the current window size (end - start).
  5. Finally, return the value stored in longest as the length of the longest substring with at most two distinct characters.

This solution effectively manages the sliding window and utilizes a dictionary to keep track of character occurrences, optimizing the process of finding the correct substring length.

Comments

No comments yet.