Count Number of Homogenous Substrings

Updated on 09 May, 2025
Count Number of Homogenous Substrings header image

Problem Statement

In this problem, we are given a string s and are tasked with counting the number of its homogenous substrings. A homogenous substring is defined as a sequence of characters where all characters are the same. Each contiguous substring that fits this definition should be counted. The solution's result must be returned as a modulo 10^9 + 7, to handle potentially large numbers due to the constraints on the string length.

Exploring the definition through examples:

  • For a string "aaa", the homogenous substrings include "a", "a", "a", "aa", "aa", and "aaa". Each individual occurrence of a character as well as each possible contiguous grouping of identical characters should be considered.

Given the constraints, where the length can be as large as 100,000 and only lowercase letters are involved, efficient string processing is critical to manage potential time complexities.

Examples

Example 1

Input:

s = "abbcccaa"

Output:

13

Explanation:

The homogenous substrings are listed as below:
"a" appears 3 times.
"aa" appears 1 time.
"b" appears 2 times.
"bb" appears 1 time.
"c" appears 3 times.
"cc" appears 2 times.
"ccc" appears 1 time.
3 + 1 + 2 + 1 + 3 + 2 + 1 = 13.

Example 2

Input:

s = "xy"

Output:

2

Explanation:

The homogenous substrings are "x" and "y".

Example 3

Input:

s = "zzzzz"

Output:

15

Constraints

  • 1 <= s.length <= 105
  • s consists of lowercase letters.

Approach and Intuition

The task is to identify all contiguous blocks of the same characters and determine the count of all possible substrings within these blocks.

Steps to Solve:

  1. Traverse through the string s using a pointer or index.

  2. Start by identifying the beginning of a homogenous block. This is easily spotted when the current character does not match the previous character (or if we are at the start of the string).

  3. Maintain a count of how many times the same character appears consecutively. Let’s call this length L for any block.

  4. Using the property that a single character can form 1 substring, two consecutive characters can form 1 + 2 = 3 substrings, and so on, the total number of substrings for a homogenous block of length L can be computed using the formula L * (L + 1) / 2.

  5. As you detect the end of a homogenous block (either when s[i] is not equal to s[i-1] or the string has ended), compute and add the result of L * (L + 1) / 2 to the total count of substrings.

  6. Be sure to include the result modulo 10^9 + 7 at each addition to keep numbers manageable and handle the overflow.

Example Analysis:

  • For s = "abbcccaa", the sequence of operations will be:
    • Find 'a'; length is 1, yielding 1 substring.
    • Find 'bb'; length is 2, yielding 2*(2+1)/2 = 3 substrings.
    • And so forth, until the end of the string.

This approach minimizes the need to store or directly count each seen substring, instead leveraging mathematical properties to efficiently compute required values based on identified patterns in the data, adhering well to the problem’s constraints.

Solutions

  • C++
  • Java
  • Python
cpp
class Solution {
public:
    int countContiguous(string str) {
        int total = 0;
        int streak = 0;
        int CONST_MOD = 1000000007;
        
        for (int index = 0; index < str.length(); index++) {
            if (index == 0 || str[index] == str[index - 1]) {
                streak++;
            } else {
                streak = 1;
            }
            
            total = (total + streak) % CONST_MOD;
        }
        
        return total;
    }
};

In this solution summary for the problem of counting the number of homogeneous substrings, the code is implemented in C++.

Discover how to process a given string to count all contiguous substrings where all characters are the same using the countContiguous function. This function:

  • Initializes a variable total to keep track of the total number of homogeneous substrings.
  • Initializes a streak variable to count the length of the current homogeneous substring.
  • Defines a constant CONST_MOD set to 1000000007 to ensure the result remains manageable and to handle large values by taking modulo CONST_MOD.

Explore the use of a for loop to iterate over each character in the string:

  1. For the first character or if the current character is the same as the previous, increment the streak.
  2. If the character is different, reset the streak to 1.
  3. After updating the streak, add its value to total, and take modulo CONST_MOD to update the total.

Finally:

  • The function returns the total, which represents the count of all homogeneous substrig in the provided string.
java
class Solution {
    public int calculateHomogenousSubstrings(String str) {
        int total = 0;
        int streak = 0;
        int MODULO = (int) 1e9 + 7;
        
        for (int i = 0; i < str.length(); i++) {
            if (i == 0 || str.charAt(i) == str.charAt(i - 1)) {
                streak++;
            } else {
                streak = 1;
            }
            
            total = (total + streak) % MODULO;
        }
        
        return total;
    }
}

This summary guides you through the Java solution for counting the number of homogenous substrings in a given string.

  • Start with setting up initial variables. total tracks the cumulative count of homogenous substrings, streak counts the length of the current sequence of identical characters, and MODULO is used to avoid overflow by performing modulo operation with (10^9 + 7).

  • Iterate through the provided string using a for loop. On each iteration, check if the current character matches the previous character (or if it's the first character being checked). If they match or if it's the first character, increment the streak count. Otherwise, reset streak to 1, meaning a new sequence of identical characters starts here.

  • After determining the streak value, add it to the total count. To ensure the number remains within bounds and prevent overflow, use the modulo operation with MODULO.

  • At the end of the loop, return the value of total, which represents the count of all homogenous substrings in the string.

This approach efficiently calculates the number of homogenous substrings with a time complexity of ( O(n) ), where ( n ) is the length of the string, and a constant space complexity.

python
class Solution:
    def homogeneousCount(self, s: str) -> int:
        total = 0
        consecutive_count = 0
        MODULO = 10 ** 9 + 7

        for i in range(len(s)):
            if i == 0 or s[i] == s[i - 1]:
                consecutive_count += 1
            else:
                consecutive_count = 1
            
            total = (total + consecutive_count) % MODULO
        
        return total

This Python3 solution focuses on counting the number of homogenous substrings in a given string, where a substring is considered homogenous if it consists of the same character repeated one or more times. The approach leverages a loop to traverse each character of the string, utilizing a consecutive_count variable to keep track of the length of the current homogenous substring. The total variable aggregates the count of all homogenous substrings found during iteration.

  • Initialize two variables: total to accumulate the number of substrings and consecutive_count to count the length of the current homogenous substring.
  • Use a MODULO constant of 10 ** 9 + 7 to handle large numbers by taking the modulo of the total count at every step to prevent overflow.
  • Iterate over each character in the string:
    • If it's the first character or matches the previous character, increase consecutive_count.
    • Otherwise, reset consecutive_count to 1 for a new substring start.
    • Increment total by consecutive_count, applying the modulo to keep the number manageable.
  • The result, stored in total, represents the total count of homogenous substrings modulus 10 ** 9 + 7.

Overall, this approach ensures efficiency and clarity, making optimal use of arithmetic operations and control structures to address the problem comprehensively.

Comments

No comments yet.