
Problem Statement
In this problem, you're provided with a string s
, which is made up strictly of lowercase English letters. Your task is to determine the total count of substrings that both start and end with the same character.
A substring is a sequence of continuous characters within a string and is always non-empty. You must count all such substrings where the first and last characters are identical.
Examples
Example 1
Input:
s = "abcba"
Output:
7
Explanation:
The substrings of length 1 that start and end with the same letter are: "a", "b", "c", "b", "a". The substring of length 3: "bcb". The substring of length 5: "abcba".
Example 2
Input:
s = "abacad"
Output:
9
Explanation:
Substrings of length 1: "a", "b", "a", "c", "a", "d". Substrings of length 3: "aba", "aca". Substring of length 5: "abaca".
Example 3
Input:
s = "a"
Output:
1
Explanation:
Only one substring exists: "a".
Constraints
1 <= s.length <= 10^5
s
consists only of lowercase English letters.
Approach and Intuition
Approach
- Use a frequency counter for each character in
s
. - For each character that appears
f
times, the number of substrings that start and end with it isf * (f + 1) / 2
. - Sum this across all characters to get the final count.
This approach avoids iterating over all possible substrings, making it efficient for large strings.
Intuition
Each time a character appears, it creates new opportunities for substrings that start and end with it. The formula f * (f + 1) / 2
comes from combinatorics — counting how many pairs (i, j) can be formed such that s[i] == s[j]
, and i <= j
.
Solutions
- C++
- Java
- Python
class Solution {
public:
long long countSubstrings(string str) {
long long totalCount = 0;
vector<long long> charFrequency(26, 0);
// Increment frequency for each character in the string
for (char c : str) {
charFrequency[c - 'a']++;
}
// Calculate total substrings that can be formed
for (long long freq : charFrequency) {
// Combinatorial calculation of substrings for each character frequency
totalCount += (freq * (freq + 1)) / 2;
}
return totalCount;
}
};
The provided C++ solution counts the total number of substrings within a string where the substring starts and ends with the same character. The implementation uses a combinatorial approach, exploiting the fact that each group of identical characters contributes to a specific number of palindromic substrings.
Here's the breakdown of the operations performed in the solution:
A 26-element vector named
charFrequency
is initialized to zero, which is used to store the frequency of each letter (assuming only lowercase English alphabets).The code iterates through each character of the input string, updating the frequency of each character in the
charFrequency
array.For each character frequency in
charFrequency
, the calculation(freq * (freq + 1)) / 2
adds up the possible substrings where characters are the same at the beginning and the end. This calculation derives from the combinatorial principle that if a character appearsfreq
times, the number of ways you can pick two positions to start and end a substring is given by the formula for combinations C(n+1, 2), which simplifies to(n * (n + 1)) / 2
.
Through these steps, the function countSubstrings
ultimately returns the total count of substrings meeting the criteria. The use of vector for character counts and the direct formula application allows the function to operate efficiently with a time complexity dependent primarily on the length of the string and linear with respect to the alphabet size.
class Solution {
public long countSubstringsWithAllUniqueChars(String inputStr) {
long totalCount = 0;
long[] charCounts = new long[26];
// Iterate over each character to populate frequency array
for (char c : inputStr.toCharArray()) {
charCounts[c - 'a']++;
}
// Sum up potential substrings for each character
for (long freq : charCounts) {
totalCount += (freq * (freq + 1)) / 2;
}
return totalCount;
}
}
The provided Java solution aims to solve the problem of finding the number of substrings within a given string where the substrings start and end with the same letter. Here’s a concise breakdown of how the code accomplishes this:
The solution defines a
countSubstringsWithAllUniqueChars
method within aSolution
class to count such substrings in a giveninputStr
.Instantiate
totalCount
to keep the count of compliant substrings, which starts at0
.Use a frequency array
charCounts
sized for 26 characters to track how many times each character from 'a' to 'z' appears ininputStr
.Convert each character in the string to its respective position in the
charCounts
array and increment its value accordingly. This captures the total occurrence of each character.Calculate potential substrings for each character. If a character appears 'n' times, it can potentially create
(n * (n + 1)) / 2
substrings where the substrings begin and end with that character. Add these possible substrings tototalCount
.Finally, return
totalCount
to provide the number of substrings that meet the condition.
This approach primarily leverages mathematical combinations to efficiently count potential substrings without explicitly generating them, illustrating an optimal solution for large input strings due to its linear time complexity related to the size of the input.
class Solution:
def countValidSubstrings(self, input_str: str) -> int:
result = 0
char_frequency = [0] * 26
# Update frequency of each character
for char in input_str:
char_frequency[ord(char) - ord('a')] += 1
# Compute total number of substrings that are valid
for freq in char_frequency:
result += (freq * (freq + 1)) // 2
return result
The given Python code aims to solve the problem of counting substrings where the first and last character are the same. This task is achieved by utilizing a frequency array and combinatorial mathematics.
The code creates a method countValidSubstrings
that takes a string input_str
as input and initializes a count result
to 0, and an array char_frequency
sized 26 (assuming the input string consists only of lowercase English letters) to keep track of the frequency of each character.
The process can be divided into two main steps:
- Iterate over each character in
input_str
. For each character, update its frequency count inchar_frequency
by increasing the count index corresponding to the character. - Loop through the
char_frequency
array to calculate the result. For each non-zero frequencyfreq
, compute the number of possible substrings that start and end with that character using the formula(freq * (freq + 1)) // 2
and add it toresult
.
Finally, the method returns the total count result
, which represents the number of valid substrings where the first and last characters are the same. Using this method, we capitalize on efficient character processing and mathematical computation to derive the result.
No comments yet.