Count Number of Homogenous Substrings

Problem Statement
In this problem, we are given a string s and are tasked with counting the number of its homogenous substrings. A homogenous substring is defined as a sequence of characters where all characters are the same. Each contiguous substring that fits this definition should be counted. The solution's result must be returned as a modulo 10^9 + 7, to handle potentially large numbers due to the constraints on the string length.
Exploring the definition through examples:
- For a string "aaa", the homogenous substrings include "a", "a", "a", "aa", "aa", and "aaa". Each individual occurrence of a character as well as each possible contiguous grouping of identical characters should be considered.
Given the constraints, where the length can be as large as 100,000 and only lowercase letters are involved, efficient string processing is critical to manage potential time complexities.
Examples
Example 1
Input:
s = "abbcccaa"
Output:
13
Explanation:
The homogenous substrings are listed as below: "a" appears 3 times. "aa" appears 1 time. "b" appears 2 times. "bb" appears 1 time. "c" appears 3 times. "cc" appears 2 times. "ccc" appears 1 time. 3 + 1 + 2 + 1 + 3 + 2 + 1 = 13.
Example 2
Input:
s = "xy"
Output:
2
Explanation:
The homogenous substrings are "x" and "y".
Example 3
Input:
s = "zzzzz"
Output:
15
Constraints
1 <= s.length <= 105sconsists of lowercase letters.
Approach and Intuition
The task is to identify all contiguous blocks of the same characters and determine the count of all possible substrings within these blocks.
Steps to Solve:
Traverse through the string
susing a pointer or index.Start by identifying the beginning of a homogenous block. This is easily spotted when the current character does not match the previous character (or if we are at the start of the string).
Maintain a count of how many times the same character appears consecutively. Let’s call this length
Lfor any block.Using the property that a single character can form 1 substring, two consecutive characters can form
1 + 2 = 3substrings, and so on, the total number of substrings for a homogenous block of lengthLcan be computed using the formulaL * (L + 1) / 2.As you detect the end of a homogenous block (either when
s[i]is not equal tos[i-1]or the string has ended), compute and add the result ofL * (L + 1) / 2to the total count of substrings.Be sure to include the result modulo
10^9 + 7at each addition to keep numbers manageable and handle the overflow.
Example Analysis:
- For
s = "abbcccaa", the sequence of operations will be:- Find 'a'; length is 1, yielding
1substring. - Find 'bb'; length is 2, yielding
2*(2+1)/2 = 3substrings. - And so forth, until the end of the string.
- Find 'a'; length is 1, yielding
This approach minimizes the need to store or directly count each seen substring, instead leveraging mathematical properties to efficiently compute required values based on identified patterns in the data, adhering well to the problem’s constraints.
Solutions
- C++
- Java
- Python
class Solution {
public:
int countContiguous(string str) {
int total = 0;
int streak = 0;
int CONST_MOD = 1000000007;
for (int index = 0; index < str.length(); index++) {
if (index == 0 || str[index] == str[index - 1]) {
streak++;
} else {
streak = 1;
}
total = (total + streak) % CONST_MOD;
}
return total;
}
};
In this solution summary for the problem of counting the number of homogeneous substrings, the code is implemented in C++.
Discover how to process a given string to count all contiguous substrings where all characters are the same using the countContiguous function. This function:
- Initializes a variable
totalto keep track of the total number of homogeneous substrings. - Initializes a
streakvariable to count the length of the current homogeneous substring. - Defines a constant
CONST_MODset to1000000007to ensure the result remains manageable and to handle large values by taking moduloCONST_MOD.
Explore the use of a for loop to iterate over each character in the string:
- For the first character or if the current character is the same as the previous, increment the
streak. - If the character is different, reset the
streakto 1. - After updating the
streak, add its value tototal, and take moduloCONST_MODto update thetotal.
Finally:
- The function returns the
total, which represents the count of all homogeneous substrig in the provided string.
class Solution {
public int calculateHomogenousSubstrings(String str) {
int total = 0;
int streak = 0;
int MODULO = (int) 1e9 + 7;
for (int i = 0; i < str.length(); i++) {
if (i == 0 || str.charAt(i) == str.charAt(i - 1)) {
streak++;
} else {
streak = 1;
}
total = (total + streak) % MODULO;
}
return total;
}
}
This summary guides you through the Java solution for counting the number of homogenous substrings in a given string.
Start with setting up initial variables.
totaltracks the cumulative count of homogenous substrings,streakcounts the length of the current sequence of identical characters, andMODULOis used to avoid overflow by performing modulo operation with (10^9 + 7).Iterate through the provided string using a
forloop. On each iteration, check if the current character matches the previous character (or if it's the first character being checked). If they match or if it's the first character, increment thestreakcount. Otherwise, resetstreakto 1, meaning a new sequence of identical characters starts here.After determining the streak value, add it to the
totalcount. To ensure the number remains within bounds and prevent overflow, use the modulo operation withMODULO.At the end of the loop, return the value of
total, which represents the count of all homogenous substrings in the string.
This approach efficiently calculates the number of homogenous substrings with a time complexity of ( O(n) ), where ( n ) is the length of the string, and a constant space complexity.
class Solution:
def homogeneousCount(self, s: str) -> int:
total = 0
consecutive_count = 0
MODULO = 10 ** 9 + 7
for i in range(len(s)):
if i == 0 or s[i] == s[i - 1]:
consecutive_count += 1
else:
consecutive_count = 1
total = (total + consecutive_count) % MODULO
return total
This Python3 solution focuses on counting the number of homogenous substrings in a given string, where a substring is considered homogenous if it consists of the same character repeated one or more times. The approach leverages a loop to traverse each character of the string, utilizing a consecutive_count variable to keep track of the length of the current homogenous substring. The total variable aggregates the count of all homogenous substrings found during iteration.
- Initialize two variables:
totalto accumulate the number of substrings andconsecutive_countto count the length of the current homogenous substring. - Use a
MODULOconstant of10 ** 9 + 7to handle large numbers by taking the modulo of the total count at every step to prevent overflow. - Iterate over each character in the string:
- If it's the first character or matches the previous character, increase
consecutive_count. - Otherwise, reset
consecutive_countto 1 for a new substring start. - Increment
totalbyconsecutive_count, applying the modulo to keep the number manageable.
- If it's the first character or matches the previous character, increase
- The result, stored in
total, represents the total count of homogenous substrings modulus10 ** 9 + 7.
Overall, this approach ensures efficiency and clarity, making optimal use of arithmetic operations and control structures to address the problem comprehensively.