
Problem Statement
You are given a 0-indexed string s along with a list of queries. Each query is represented by a pair of indices [li, ri], which specifies a substring of s starting at index li and ending at index ri. For each substring derived from a query, your task is to calculate the number of same-end substrings contained within it.
A substring qualifies as a same-end substring if the first and last characters of the substring are identical. You need to return an array where each element represents the count of same-end substrings for the corresponding query.
Examples
Example 1
Input:
s = "abcaab", queries = [[0,0],[1,4],[2,5],[0,5]]
Output:
[1,5,5,10]
Explanation:
1st query: s[0..0] = "a" - Same-end substrings: ["a"] → Count = 1 2nd query: s[1..4] = "bcaa" - Same-end substrings: ["b", "c", "a", "a", "aa"] → Count = 5 3rd query: s[2..5] = "caab" - Same-end substrings: ["c", "a", "a", "b", "aa"] → Count = 5 4th query: s[0..5] = "abcaab" - Same-end substrings: ["a", "b", "c", "a", "a", "b", "aa", "aa", "aba", "abcaab"] → Count = 10
Example 2
Input:
s = "abcd", queries = [[0,3]]
Output:
[4]
Explanation:
Query: s[0..3] = "abcd" - Same-end substrings: ["a", "b", "c", "d"] → Count = 4
Constraints
2 <= s.length <= 3 * 10^4sconsists only of lowercase English letters.1 <= queries.length <= 3 * 10^4queries[i] = [li, ri]0 <= li <= ri < s.length
Approach and Intuition
To compute the number of same-end substrings for each query efficiently, consider the following:
Observation: A substring is considered a same-end substring if its first and last characters are the same. Every single character is trivially a valid same-end substring.
Brute Force Approach: For each query:
- Extract the substring.
- Enumerate all possible substrings within that range.
- Count those whose first and last character match.
However, this brute-force approach is too slow due to the high limits in constraints.
Optimized Strategy:
Preprocess the string with prefix frequency counts for each character.
For a given query substring, count how many times each character appears.
For each character
chappearingktimes in the substring, it can form:ksingle-character substringsk choose 2 = k * (k - 1) / 2multi-length substrings with same start and end.
So the total number of same-end substrings contributed by
chis:k + (k * (k - 1)) / 2
This approach ensures O(1) query time after O(n) preprocessing per character.
This method dramatically reduces the time complexity and makes the solution scale properly for large input sizes.
Solutions
- C++
class Solution {
public:
vector<int> calculateSameEndSubstrings(string str, vector<vector<int>>& requests) {
int length = str.length();
// Create prefix sum arrays for each character count
vector<vector<int>> prefixSums(26, vector<int>(length, 0));
// Initial calculation of character frequencies
for (int i = 0; i < length; i++) {
prefixSums[str[i] - 'a'][i]++;
}
// Build prefix sums for character occurrences
for (int c = 0; c < 26; c++) {
for (int j = 1; j < length; j++) {
prefixSums[c][j] += prefixSums[c][j - 1];
}
}
vector<int> outcomes(requests.size());
// Evaluate each query
for (int idx = 0; idx < requests.size(); idx++) {
int start = requests[idx][0];
int end = requests[idx][1];
int substringCount = 0;
// Count substrings ending with the same character
for (int ch = 0; ch < 26; ch++) {
int frequencyAtStart = (start == 0) ? 0 : prefixSums[ch][start - 1];
int frequencyAtEnd = prefixSums[ch][end];
int totalFrequency = frequencyAtEnd - frequencyAtStart;
// Calculate number of substrings based on the frequency
substringCount += totalFrequency * (totalFrequency + 1) / 2;
}
outcomes[idx] = substringCount;
}
return outcomes;
}
};
The provided C++ solution describes a method to calculate the number of substrings ending with the same character according to specified ranges in a string. The calculateSameEndSubstrings function in the Solution class takes a string and a list of requests, each defining a start and end range within the string. Follow the approach to understand the logic and operations involved:
First, initialize a
prefixSums2D vector. It holds the count of each character (total 26 for lowercase alphabets) up to certain positions in the string.Populate the
prefixSumsarray by traversing through the string and updating the relevant indexes by incrementing character counts.Convert the array into a prefix sum format: Each value at an index
jfor charactercbecomes the cumulative count from the start of the string to positionj.Prepare the vector
outcomesto store results for each query.Process each query from the
requestsvector:- Calculate the start and end range from the request.
- For each character in the alphabet, retrieve the frequency count for the specified range using the
prefixSumsarray. - Calculate and sum up the possible substrings' count that could be formed with this frequency by using the formula
n * (n+1) / 2wherenis the frequency.
Finally, store the computed results in the
outcomesvector for each query.
This approach effectively uses the method of prefix sums allowing the function to efficiently handle range queries for substring calculations. The key part of the solution includes managing prefix sums for each character and calculating possible substrings from character frequency over given ranges. Through this, the function returns a list of results matching the requested queries.
- Java
class Solution {
public int[] calculateSubstringFrequency(String str, int[][] queries) {
int length = str.length();
int[][] prefixSumOfCharFreq = new int[26][length];
// Initialize character frequency array
for (int i = 0; i < length; i++) {
prefixSumOfCharFreq[str.charAt(i) - 'a'][i]++;
}
// Create prefix sum from character frequencies
for (int charId = 0; charId < 26; charId++) {
for (int pos = 1; pos < length; pos++) {
prefixSumOfCharFreq[charId][pos] += prefixSumOfCharFreq[charId][pos - 1];
}
}
int[] results = new int[queries.length];
// Evaluate each query to find results
for (int index = 0; index < queries.length; index++) {
int start = queries[index][0];
int end = queries[index][1];
int totalSubstrings = 0;
// Compute substring count for each character
for (int charPos = 0; charPos < 26; charPos++) {
int freqStart = (start == 0) ? 0 : prefixSumOfCharFreq[charPos][start - 1];
int freqEnd = prefixSumOfCharFreq[charPos][end];
int totalFreq = freqEnd - freqStart;
// Count substrings with the same beginning and end
totalSubstrings += (totalFreq * (totalFreq + 1)) / 2;
}
results[index] = totalSubstrings;
}
return results;
}
}
The given Java program defines a Solution class that contains the method calculateSubstringFrequency, which takes a string and a list of queries to calculate the frequency of substrings that begin and end with the same character.
- The method calculates the length of the input string and initializes a 2D array
prefixSumOfCharFreqto store the prefix sums of character frequencies for each character from 'a' to 'z'. - First, the character frequencies are populated by iterating through each character of the string and using the ASCII values to place the frequency count accurately in the array.
- The program then computes the prefix sums for the character frequencies, making it efficient to query the frequency of any character over any substring.
- For each query, which includes start and end indices, the algorithm computes the frequency of every character between these indices by using the prefix sums.
- Using the difference between prefix sums at the end and start indices, the total frequency of each character is calculated for the given range.
- The total substrings with the same start and end characters are then calculated for each character. The logic uses the combinatorial identity for choosing two out of
nitems (n+1 choose 2). - This count is summed up to get the result for each query.
Finally, the results array is returned, which contains the total count of substrings fulfilling the criteria mentioned above for each query. This approach efficiently handles the substring queries by converting them into a frequency counting and prefix sum problem.
- Python
class Solution:
def countSubstringsWithSameEnds(
self, string: str, queries: List[List[int]]
) -> List[int]:
length = len(string)
# Prefix sum array for 'a' to 'z' character frequencies
prefix_sum = [[0] * length for _ in range(26)]
# Initialize the character frequency array
for idx, character in enumerate(string):
prefix_sum[ord(character) - ord('a')][idx] += 1
# Build prefix sum from the frequency data
for character_prefix in prefix_sum:
for j in range(1, length):
character_prefix[j] += character_prefix[j - 1]
answer = []
# Evaluate each query
for start, end in queries:
total_substrings = 0
# Compute the number of same end-letter substrings
for char_prefix in prefix_sum:
if start == 0:
left_freq = 0
else:
left_freq = char_prefix[start - 1]
right_freq = char_prefix[end]
freq_in_range = right_freq - left_freq
# Count possible substrings with the same end
total_substrings += (
freq_in_range * (freq_in_range + 1) // 2
)
answer.append(total_substrings)
return answer
This Python solution provides a method to count substrings within specific ranges where the substrings start and end with the same character. The class Solution includes a method countSubstringsWithSameEnds which takes a string and a list of queries as arguments, with each query specifying a [start, end] range in the string. The method returns the count of such substrings as a list of integers for each query.
Here's a breakdown of the approach:
Prefix Sum Preparation: First, a 2D
prefix_sumarray is initialized where each row corresponds to a character ('a'to'z') and each column corresponds to indices in the string. This array is populated to reflect the cumulative frequency of each character up to each index in the string.Accuracy in Queries: For each query represented by
startandend, the method calculates the frequency of each character within this range using theprefix_sumdata. This is done by calculating the difference between the prefix sums atendandstart-1.Counting Valid Substrings: For each character present in the queried range, the number of ways to select substrings that start and end with this character is calculated using the formula for combinations (
n * (n + 1) // 2, wherenis the number of occurrences of that character). The sums of these values over all characters give the total count of valid substrings.Efficiency: The use of
prefix_sumarrays reduces the time complexity significantly as it avoids recalculating the frequency for each query, instead deriving it from precomputed values. This makes the solution efficient even for large input sizes and multiple queries.
This method effectively and efficiently computes the number of substrings with the same ending character per each query, leveraging data preprocessing with prefix sums and thoughtful mathematical derivation for substring combinations.