
Problem Statement
You are given a 0-indexed string s
along with a list of queries. Each query is represented by a pair of indices [li, ri]
, which specifies a substring of s
starting at index li
and ending at index ri
. For each substring derived from a query, your task is to calculate the number of same-end substrings contained within it.
A substring qualifies as a same-end substring if the first and last characters of the substring are identical. You need to return an array where each element represents the count of same-end substrings for the corresponding query.
Examples
Example 1
Input:
s = "abcaab", queries = [[0,0],[1,4],[2,5],[0,5]]
Output:
[1,5,5,10]
Explanation:
1st query: s[0..0] = "a" - Same-end substrings: ["a"] → Count = 1 2nd query: s[1..4] = "bcaa" - Same-end substrings: ["b", "c", "a", "a", "aa"] → Count = 5 3rd query: s[2..5] = "caab" - Same-end substrings: ["c", "a", "a", "b", "aa"] → Count = 5 4th query: s[0..5] = "abcaab" - Same-end substrings: ["a", "b", "c", "a", "a", "b", "aa", "aa", "aba", "abcaab"] → Count = 10
Example 2
Input:
s = "abcd", queries = [[0,3]]
Output:
[4]
Explanation:
Query: s[0..3] = "abcd" - Same-end substrings: ["a", "b", "c", "d"] → Count = 4
Constraints
2 <= s.length <= 3 * 10^4
s
consists only of lowercase English letters.1 <= queries.length <= 3 * 10^4
queries[i] = [li, ri]
0 <= li <= ri < s.length
Approach and Intuition
To compute the number of same-end substrings for each query efficiently, consider the following:
Observation: A substring is considered a same-end substring if its first and last characters are the same. Every single character is trivially a valid same-end substring.
Brute Force Approach: For each query:
- Extract the substring.
- Enumerate all possible substrings within that range.
- Count those whose first and last character match.
However, this brute-force approach is too slow due to the high limits in constraints.
Optimized Strategy:
Preprocess the string with prefix frequency counts for each character.
For a given query substring, count how many times each character appears.
For each character
ch
appearingk
times in the substring, it can form:k
single-character substringsk choose 2 = k * (k - 1) / 2
multi-length substrings with same start and end.
So the total number of same-end substrings contributed by
ch
is:k + (k * (k - 1)) / 2
This approach ensures O(1) query time after O(n) preprocessing per character.
This method dramatically reduces the time complexity and makes the solution scale properly for large input sizes.
Solutions
- C++
class Solution {
public:
vector<int> calculateSameEndSubstrings(string str, vector<vector<int>>& requests) {
int length = str.length();
// Create prefix sum arrays for each character count
vector<vector<int>> prefixSums(26, vector<int>(length, 0));
// Initial calculation of character frequencies
for (int i = 0; i < length; i++) {
prefixSums[str[i] - 'a'][i]++;
}
// Build prefix sums for character occurrences
for (int c = 0; c < 26; c++) {
for (int j = 1; j < length; j++) {
prefixSums[c][j] += prefixSums[c][j - 1];
}
}
vector<int> outcomes(requests.size());
// Evaluate each query
for (int idx = 0; idx < requests.size(); idx++) {
int start = requests[idx][0];
int end = requests[idx][1];
int substringCount = 0;
// Count substrings ending with the same character
for (int ch = 0; ch < 26; ch++) {
int frequencyAtStart = (start == 0) ? 0 : prefixSums[ch][start - 1];
int frequencyAtEnd = prefixSums[ch][end];
int totalFrequency = frequencyAtEnd - frequencyAtStart;
// Calculate number of substrings based on the frequency
substringCount += totalFrequency * (totalFrequency + 1) / 2;
}
outcomes[idx] = substringCount;
}
return outcomes;
}
};
The provided C++ solution describes a method to calculate the number of substrings ending with the same character according to specified ranges in a string. The calculateSameEndSubstrings
function in the Solution
class takes a string and a list of requests, each defining a start and end range within the string. Follow the approach to understand the logic and operations involved:
First, initialize a
prefixSums
2D vector. It holds the count of each character (total 26 for lowercase alphabets) up to certain positions in the string.Populate the
prefixSums
array by traversing through the string and updating the relevant indexes by incrementing character counts.Convert the array into a prefix sum format: Each value at an index
j
for characterc
becomes the cumulative count from the start of the string to positionj
.Prepare the vector
outcomes
to store results for each query.Process each query from the
requests
vector:- Calculate the start and end range from the request.
- For each character in the alphabet, retrieve the frequency count for the specified range using the
prefixSums
array. - Calculate and sum up the possible substrings' count that could be formed with this frequency by using the formula
n * (n+1) / 2
wheren
is the frequency.
Finally, store the computed results in the
outcomes
vector for each query.
This approach effectively uses the method of prefix sums allowing the function to efficiently handle range queries for substring calculations. The key part of the solution includes managing prefix sums for each character and calculating possible substrings from character frequency over given ranges. Through this, the function returns a list of results matching the requested queries.
- Java
class Solution {
public int[] calculateSubstringFrequency(String str, int[][] queries) {
int length = str.length();
int[][] prefixSumOfCharFreq = new int[26][length];
// Initialize character frequency array
for (int i = 0; i < length; i++) {
prefixSumOfCharFreq[str.charAt(i) - 'a'][i]++;
}
// Create prefix sum from character frequencies
for (int charId = 0; charId < 26; charId++) {
for (int pos = 1; pos < length; pos++) {
prefixSumOfCharFreq[charId][pos] += prefixSumOfCharFreq[charId][pos - 1];
}
}
int[] results = new int[queries.length];
// Evaluate each query to find results
for (int index = 0; index < queries.length; index++) {
int start = queries[index][0];
int end = queries[index][1];
int totalSubstrings = 0;
// Compute substring count for each character
for (int charPos = 0; charPos < 26; charPos++) {
int freqStart = (start == 0) ? 0 : prefixSumOfCharFreq[charPos][start - 1];
int freqEnd = prefixSumOfCharFreq[charPos][end];
int totalFreq = freqEnd - freqStart;
// Count substrings with the same beginning and end
totalSubstrings += (totalFreq * (totalFreq + 1)) / 2;
}
results[index] = totalSubstrings;
}
return results;
}
}
The given Java program defines a Solution
class that contains the method calculateSubstringFrequency
, which takes a string and a list of queries to calculate the frequency of substrings that begin and end with the same character.
- The method calculates the length of the input string and initializes a 2D array
prefixSumOfCharFreq
to store the prefix sums of character frequencies for each character from 'a' to 'z'. - First, the character frequencies are populated by iterating through each character of the string and using the ASCII values to place the frequency count accurately in the array.
- The program then computes the prefix sums for the character frequencies, making it efficient to query the frequency of any character over any substring.
- For each query, which includes start and end indices, the algorithm computes the frequency of every character between these indices by using the prefix sums.
- Using the difference between prefix sums at the end and start indices, the total frequency of each character is calculated for the given range.
- The total substrings with the same start and end characters are then calculated for each character. The logic uses the combinatorial identity for choosing two out of
n
items (n+1 choose 2). - This count is summed up to get the result for each query.
Finally, the results array is returned, which contains the total count of substrings fulfilling the criteria mentioned above for each query. This approach efficiently handles the substring queries by converting them into a frequency counting and prefix sum problem.
- Python
class Solution:
def countSubstringsWithSameEnds(
self, string: str, queries: List[List[int]]
) -> List[int]:
length = len(string)
# Prefix sum array for 'a' to 'z' character frequencies
prefix_sum = [[0] * length for _ in range(26)]
# Initialize the character frequency array
for idx, character in enumerate(string):
prefix_sum[ord(character) - ord('a')][idx] += 1
# Build prefix sum from the frequency data
for character_prefix in prefix_sum:
for j in range(1, length):
character_prefix[j] += character_prefix[j - 1]
answer = []
# Evaluate each query
for start, end in queries:
total_substrings = 0
# Compute the number of same end-letter substrings
for char_prefix in prefix_sum:
if start == 0:
left_freq = 0
else:
left_freq = char_prefix[start - 1]
right_freq = char_prefix[end]
freq_in_range = right_freq - left_freq
# Count possible substrings with the same end
total_substrings += (
freq_in_range * (freq_in_range + 1) // 2
)
answer.append(total_substrings)
return answer
This Python solution provides a method to count substrings within specific ranges where the substrings start and end with the same character. The class Solution
includes a method countSubstringsWithSameEnds
which takes a string and a list of queries as arguments, with each query specifying a [start
, end
] range in the string. The method returns the count of such substrings as a list of integers for each query.
Here's a breakdown of the approach:
Prefix Sum Preparation: First, a 2D
prefix_sum
array is initialized where each row corresponds to a character ('a'
to'z'
) and each column corresponds to indices in the string. This array is populated to reflect the cumulative frequency of each character up to each index in the string.Accuracy in Queries: For each query represented by
start
andend
, the method calculates the frequency of each character within this range using theprefix_sum
data. This is done by calculating the difference between the prefix sums atend
andstart-1
.Counting Valid Substrings: For each character present in the queried range, the number of ways to select substrings that start and end with this character is calculated using the formula for combinations (
n * (n + 1) // 2
, wheren
is the number of occurrences of that character). The sums of these values over all characters give the total count of valid substrings.Efficiency: The use of
prefix_sum
arrays reduces the time complexity significantly as it avoids recalculating the frequency for each query, instead deriving it from precomputed values. This makes the solution efficient even for large input sizes and multiple queries.
This method effectively and efficiently computes the number of substrings with the same ending character per each query, leveraging data preprocessing with prefix sums and thoughtful mathematical derivation for substring combinations.
No comments yet.