Check If a String Contains All Binary Codes of Size K

Updated on 19 May, 2025
Check If a String Contains All Binary Codes of Size K header image

Problem Statement

Given a binary string s and an integer k, the task is to determine whether every possible binary code of length k exists as a substring within s. If s contains all such combinations, the function should return true; otherwise, it should return false. The challenge lies in checking for the presence of each combination effectively, especially given the constraints where the length of s can be quite large, up to 500,000 characters.

Examples

Example 1

Input:

s = "00110110", k = 2

Output:

true

Explanation:

The binary codes of length 2 are "00", "01", "10" and "11". They can be all found as substrings at indices 0, 1, 3 and 2 respectively.

Example 2

Input:

s = "0110", k = 1

Output:

true

Explanation:

The binary codes of length 1 are "0" and "1", it is clear that both exist as a substring.

Example 3

Input:

s = "0110", k = 2

Output:

false

Explanation:

The binary code "00" is of length 2 and does not exist in the array.

Constraints

  • 1 <= s.length <= 5 * 105
  • s[i] is either '0' or '1'.
  • 1 <= k <= 20

Approach and Intuition

  1. Understanding the problem:

    • We need to check if all possible binary strings of length k are substrings of a given binary string s.
    • The total number of distinct binary combinations of length k is ( 2^k ).
  2. Initial considerations:

    • If ( k ) is large, specifically close to the length of ( s ), ( s ) might not be long enough to contain all possible combinations.
    • Conversely, if ( k ) is small, ( s ) has a higher chance of containing all combinations due to the small number of combinations.
  3. Utilizing a sliding window approach:

    • We can employ a sliding window of size ( k ) that moves through ( s ).
    • As the window moves, it captures substrings of length ( k ).
  4. Storing observed patterns:

    • Use a data structure, such as a set, to store unique substrings of length ( k ) as we slide the window.
    • This aids in checking off observed combinations against the ( 2^k ) possible combinations.
  5. Efficiency checks:

    • Once the set of substrings reaches the size ( 2^k ) (since we are only looking for that many distinct substrings), the function can return true.
    • If the end of the string is reached and we haven’t observed all possible substrings, return false.
  6. Edge case considerations:

    • If the length of ( s ) is less than ( k ), it’s impossible for ( s ) to contain all combinations, so return false immediately.
    • For very small values of ( k ) (e.g., ( k = 1 )), a simple check for the presence of '0' and '1' might suffice.

By applying this methodical approach, which efficiently checks for the presence of all required substrings without repetitive calculations, the solution can handle high constraints effectively, staying within feasible execution time limits.

Solutions

  • Java
  • Python
java
class Solution {
    public static boolean containsAllKLengthSubstrings(String input, int k) {
        int required = 1 << k;
        boolean[] found = new boolean[required];
        int mask = required - 1;
        int hashValue = 0;

        for (int index = 0; index < input.length(); index++) {
            hashValue = ((hashValue << 1) & mask) | (input.charAt(index) - '0');
            if (index >= k - 1 && !found[hashValue]) {
                found[hashValue] = true;
                required--;
                if (required == 0) {
                    return true;
                }
            }
        }
        return false;
    }
}

This Java solution determines if a given string contains all possible substrings of binary codes of size k. To achieve this, follow the established methodologies:

  • Calculate required, the total number of possible binary codes of size k, using bit shifting (1 << k).
  • Utilize a boolean[] found to keep track of which binary codes have been found.
  • Use mask (computed as required - 1) to restrict the hashValue to the last k bits.
  • Iterate over each character of the input string, updating the hashValue by left shifting it, applying the mask, and adding the current binary value of the character.
  • Check if the k-length substring represented by hashValue has not been found yet; if true, mark it as found and decrease required by 1.
  • If required becomes 0, return true indicating all binary codes of length k are present in the string.
  • If the loop completes without finding all codes, return false.

Ensure correct input parameters with the string being potentially very large and k being an appropriate integer. This method effectively tracks the occurrence of binary codes using a space-efficient approach.

python
class Solution:
    def containsAllBinaryCodes(self, sequence: str, length: int) -> bool:
        required = 1 << length
        seen = [False] * required
        full_mask = required - 1
        current_hash = 0

        for j in range(len(sequence)):
            # Calculate the bit hash for current window
            current_hash = ((current_hash << 1) & full_mask) | (int(sequence[j]))
            # Mark as seen if the current hash index marks first occurrence
            if j >= length - 1 and not seen[current_hash]:
                seen[current_hash] = True
                required -= 1
                if required == 0:
                    return True
        return False

The Python3 solution defines a method containsAllBinaryCodes, which checks if a string contains all possible binary codes of a given size k. The approach utilizes bit manipulation and a sliding window technique to efficiently solve the problem. Here’s a breakdown of the implementation:

  • Initialize required to be (2^k) (number of unique binary codes of size k).
  • Create a seen list of boolean values to track which binary codes are identified in the sequence.
  • Calculate full_mask as (2^k - 1). This mask helps manage the hash to be within the scope of k-bits.
  • Initialize current_hash to track the current bit pattern of the window of size k in the sequence.

For each character sequence[j]:

  • The current_hash is updated using bit manipulation. Left shift to make room for the new bit and use the OR operation to add the current character as the least significant bit.
  • The sliding window (denoted by current_hash) checks if the current bit pattern has been seen. If it's a fresh pattern and its index was not seen, the seen index is marked True and required is decremented.
  • Reducing required each time a new code is encountered ensures that once it reaches zero, all possible codes of length k are found in the sequence.

Should all binary codes be encountered before the end of the string, the function returns True indicating the string contains all binary codes of size k. If the loop completes without finding all codes, it returns False.

Comments

No comments yet.