
Problem Statement
In this problem, we are given a string s
and are tasked with counting the number of its homogenous substrings. A homogenous substring is defined as a sequence of characters where all characters are the same. Each contiguous substring that fits this definition should be counted. The solution's result must be returned as a modulo 10^9 + 7
, to handle potentially large numbers due to the constraints on the string length.
Exploring the definition through examples:
- For a string "aaa", the homogenous substrings include "a", "a", "a", "aa", "aa", and "aaa". Each individual occurrence of a character as well as each possible contiguous grouping of identical characters should be considered.
Given the constraints, where the length can be as large as 100,000 and only lowercase letters are involved, efficient string processing is critical to manage potential time complexities.
Examples
Example 1
Input:
s = "abbcccaa"
Output:
13
Explanation:
The homogenous substrings are listed as below: "a" appears 3 times. "aa" appears 1 time. "b" appears 2 times. "bb" appears 1 time. "c" appears 3 times. "cc" appears 2 times. "ccc" appears 1 time. 3 + 1 + 2 + 1 + 3 + 2 + 1 = 13.
Example 2
Input:
s = "xy"
Output:
2
Explanation:
The homogenous substrings are "x" and "y".
Example 3
Input:
s = "zzzzz"
Output:
15
Constraints
1 <= s.length <= 105
s
consists of lowercase letters.
Approach and Intuition
The task is to identify all contiguous blocks of the same characters and determine the count of all possible substrings within these blocks.
Steps to Solve:
Traverse through the string
s
using a pointer or index.Start by identifying the beginning of a homogenous block. This is easily spotted when the current character does not match the previous character (or if we are at the start of the string).
Maintain a count of how many times the same character appears consecutively. Let’s call this length
L
for any block.Using the property that a single character can form 1 substring, two consecutive characters can form
1 + 2 = 3
substrings, and so on, the total number of substrings for a homogenous block of lengthL
can be computed using the formulaL * (L + 1) / 2
.As you detect the end of a homogenous block (either when
s[i]
is not equal tos[i-1]
or the string has ended), compute and add the result ofL * (L + 1) / 2
to the total count of substrings.Be sure to include the result modulo
10^9 + 7
at each addition to keep numbers manageable and handle the overflow.
Example Analysis:
- For
s = "abbcccaa"
, the sequence of operations will be:- Find 'a'; length is 1, yielding
1
substring. - Find 'bb'; length is 2, yielding
2*(2+1)/2 = 3
substrings. - And so forth, until the end of the string.
- Find 'a'; length is 1, yielding
This approach minimizes the need to store or directly count each seen substring, instead leveraging mathematical properties to efficiently compute required values based on identified patterns in the data, adhering well to the problem’s constraints.
Solutions
- C++
- Java
- Python
class Solution {
public:
int countContiguous(string str) {
int total = 0;
int streak = 0;
int CONST_MOD = 1000000007;
for (int index = 0; index < str.length(); index++) {
if (index == 0 || str[index] == str[index - 1]) {
streak++;
} else {
streak = 1;
}
total = (total + streak) % CONST_MOD;
}
return total;
}
};
In this solution summary for the problem of counting the number of homogeneous substrings, the code is implemented in C++.
Discover how to process a given string to count all contiguous substrings where all characters are the same using the countContiguous
function. This function:
- Initializes a variable
total
to keep track of the total number of homogeneous substrings. - Initializes a
streak
variable to count the length of the current homogeneous substring. - Defines a constant
CONST_MOD
set to1000000007
to ensure the result remains manageable and to handle large values by taking moduloCONST_MOD
.
Explore the use of a for loop to iterate over each character in the string:
- For the first character or if the current character is the same as the previous, increment the
streak
. - If the character is different, reset the
streak
to 1. - After updating the
streak
, add its value tototal
, and take moduloCONST_MOD
to update thetotal
.
Finally:
- The function returns the
total
, which represents the count of all homogeneous substrig in the provided string.
class Solution {
public int calculateHomogenousSubstrings(String str) {
int total = 0;
int streak = 0;
int MODULO = (int) 1e9 + 7;
for (int i = 0; i < str.length(); i++) {
if (i == 0 || str.charAt(i) == str.charAt(i - 1)) {
streak++;
} else {
streak = 1;
}
total = (total + streak) % MODULO;
}
return total;
}
}
This summary guides you through the Java solution for counting the number of homogenous substrings in a given string.
Start with setting up initial variables.
total
tracks the cumulative count of homogenous substrings,streak
counts the length of the current sequence of identical characters, andMODULO
is used to avoid overflow by performing modulo operation with (10^9 + 7).Iterate through the provided string using a
for
loop. On each iteration, check if the current character matches the previous character (or if it's the first character being checked). If they match or if it's the first character, increment thestreak
count. Otherwise, resetstreak
to 1, meaning a new sequence of identical characters starts here.After determining the streak value, add it to the
total
count. To ensure the number remains within bounds and prevent overflow, use the modulo operation withMODULO
.At the end of the loop, return the value of
total
, which represents the count of all homogenous substrings in the string.
This approach efficiently calculates the number of homogenous substrings with a time complexity of ( O(n) ), where ( n ) is the length of the string, and a constant space complexity.
class Solution:
def homogeneousCount(self, s: str) -> int:
total = 0
consecutive_count = 0
MODULO = 10 ** 9 + 7
for i in range(len(s)):
if i == 0 or s[i] == s[i - 1]:
consecutive_count += 1
else:
consecutive_count = 1
total = (total + consecutive_count) % MODULO
return total
This Python3 solution focuses on counting the number of homogenous substrings in a given string, where a substring is considered homogenous if it consists of the same character repeated one or more times. The approach leverages a loop to traverse each character of the string, utilizing a consecutive_count
variable to keep track of the length of the current homogenous substring. The total
variable aggregates the count of all homogenous substrings found during iteration.
- Initialize two variables:
total
to accumulate the number of substrings andconsecutive_count
to count the length of the current homogenous substring. - Use a
MODULO
constant of10 ** 9 + 7
to handle large numbers by taking the modulo of the total count at every step to prevent overflow. - Iterate over each character in the string:
- If it's the first character or matches the previous character, increase
consecutive_count
. - Otherwise, reset
consecutive_count
to 1 for a new substring start. - Increment
total
byconsecutive_count
, applying the modulo to keep the number manageable.
- If it's the first character or matches the previous character, increase
- The result, stored in
total
, represents the total count of homogenous substrings modulus10 ** 9 + 7
.
Overall, this approach ensures efficiency and clarity, making optimal use of arithmetic operations and control structures to address the problem comprehensively.
No comments yet.