Number of Same-End Substrings

Updated on 15 July, 2025
Number of Same-End Substrings header image

Problem Statement

You are given a 0-indexed string s along with a list of queries. Each query is represented by a pair of indices [li, ri], which specifies a substring of s starting at index li and ending at index ri. For each substring derived from a query, your task is to calculate the number of same-end substrings contained within it.

A substring qualifies as a same-end substring if the first and last characters of the substring are identical. You need to return an array where each element represents the count of same-end substrings for the corresponding query.

Examples

Example 1

Input:

s = "abcaab", queries = [[0,0],[1,4],[2,5],[0,5]]

Output:

[1,5,5,10]

Explanation:

1st query: s[0..0] = "a"
- Same-end substrings: ["a"]
→ Count = 1

2nd query: s[1..4] = "bcaa"
- Same-end substrings: ["b", "c", "a", "a", "aa"]
→ Count = 5

3rd query: s[2..5] = "caab"
- Same-end substrings: ["c", "a", "a", "b", "aa"]
→ Count = 5

4th query: s[0..5] = "abcaab"
- Same-end substrings: ["a", "b", "c", "a", "a", "b", "aa", "aa", "aba", "abcaab"]
→ Count = 10

Example 2

Input:

s = "abcd", queries = [[0,3]]

Output:

[4]

Explanation:

Query: s[0..3] = "abcd"
- Same-end substrings: ["a", "b", "c", "d"]
→ Count = 4

Constraints

  • 2 <= s.length <= 3 * 10^4
  • s consists only of lowercase English letters.
  • 1 <= queries.length <= 3 * 10^4
  • queries[i] = [li, ri]
  • 0 <= li <= ri < s.length

Approach and Intuition

To compute the number of same-end substrings for each query efficiently, consider the following:

  1. Observation: A substring is considered a same-end substring if its first and last characters are the same. Every single character is trivially a valid same-end substring.

  2. Brute Force Approach: For each query:

    • Extract the substring.
    • Enumerate all possible substrings within that range.
    • Count those whose first and last character match.

    However, this brute-force approach is too slow due to the high limits in constraints.

  3. Optimized Strategy:

    • Preprocess the string with prefix frequency counts for each character.

    • For a given query substring, count how many times each character appears.

    • For each character ch appearing k times in the substring, it can form:

      • k single-character substrings
      • k choose 2 = k * (k - 1) / 2 multi-length substrings with same start and end.
    • So the total number of same-end substrings contributed by ch is: k + (k * (k - 1)) / 2

    This approach ensures O(1) query time after O(n) preprocessing per character.

This method dramatically reduces the time complexity and makes the solution scale properly for large input sizes.

Solutions

  • C++
cpp
class Solution {
public:
    vector<int> calculateSameEndSubstrings(string str, vector<vector<int>>& requests) {
        int length = str.length();
        // Create prefix sum arrays for each character count
        vector<vector<int>> prefixSums(26, vector<int>(length, 0));
    
        // Initial calculation of character frequencies
        for (int i = 0; i < length; i++) {
            prefixSums[str[i] - 'a'][i]++;
        }
    
        // Build prefix sums for character occurrences
        for (int c = 0; c < 26; c++) {
            for (int j = 1; j < length; j++) {
                prefixSums[c][j] += prefixSums[c][j - 1];
            }
        }
    
        vector<int> outcomes(requests.size());
    
        // Evaluate each query
        for (int idx = 0; idx < requests.size(); idx++) {
            int start = requests[idx][0];
            int end = requests[idx][1];
            int substringCount = 0;
    
            // Count substrings ending with the same character
            for (int ch = 0; ch < 26; ch++) {
                int frequencyAtStart = (start == 0) ? 0 : prefixSums[ch][start - 1];
                int frequencyAtEnd = prefixSums[ch][end];
                int totalFrequency = frequencyAtEnd - frequencyAtStart;
    
                // Calculate number of substrings based on the frequency
                substringCount += totalFrequency * (totalFrequency + 1) / 2;
            }
    
            outcomes[idx] = substringCount;
        }
    
        return outcomes;
    }
};

The provided C++ solution describes a method to calculate the number of substrings ending with the same character according to specified ranges in a string. The calculateSameEndSubstrings function in the Solution class takes a string and a list of requests, each defining a start and end range within the string. Follow the approach to understand the logic and operations involved:

  1. First, initialize a prefixSums 2D vector. It holds the count of each character (total 26 for lowercase alphabets) up to certain positions in the string.

  2. Populate the prefixSums array by traversing through the string and updating the relevant indexes by incrementing character counts.

  3. Convert the array into a prefix sum format: Each value at an index j for character c becomes the cumulative count from the start of the string to position j.

  4. Prepare the vector outcomes to store results for each query.

  5. Process each query from the requests vector:

    • Calculate the start and end range from the request.
    • For each character in the alphabet, retrieve the frequency count for the specified range using the prefixSums array.
    • Calculate and sum up the possible substrings' count that could be formed with this frequency by using the formula n * (n+1) / 2 where n is the frequency.
  6. Finally, store the computed results in the outcomes vector for each query.

This approach effectively uses the method of prefix sums allowing the function to efficiently handle range queries for substring calculations. The key part of the solution includes managing prefix sums for each character and calculating possible substrings from character frequency over given ranges. Through this, the function returns a list of results matching the requested queries.

  • Java
java
class Solution {
    
    public int[] calculateSubstringFrequency(String str, int[][] queries) {
        int length = str.length();
        int[][] prefixSumOfCharFreq = new int[26][length];
    
        // Initialize character frequency array
        for (int i = 0; i < length; i++) {
            prefixSumOfCharFreq[str.charAt(i) - 'a'][i]++;
        }
    
        // Create prefix sum from character frequencies
        for (int charId = 0; charId < 26; charId++) {
            for (int pos = 1; pos < length; pos++) {
                prefixSumOfCharFreq[charId][pos] += prefixSumOfCharFreq[charId][pos - 1];
            }
        }
    
        int[] results = new int[queries.length];
    
        // Evaluate each query to find results
        for (int index = 0; index < queries.length; index++) {
            int start = queries[index][0];
            int end = queries[index][1];
            int totalSubstrings = 0;
    
            // Compute substring count for each character
            for (int charPos = 0; charPos < 26; charPos++) {
                int freqStart = (start == 0) ? 0 : prefixSumOfCharFreq[charPos][start - 1];
                int freqEnd = prefixSumOfCharFreq[charPos][end];
                int totalFreq = freqEnd - freqStart;
    
                // Count substrings with the same beginning and end
                totalSubstrings += (totalFreq * (totalFreq + 1)) / 2;
            }
    
            results[index] = totalSubstrings;
        }
    
        return results;
    }
}

The given Java program defines a Solution class that contains the method calculateSubstringFrequency, which takes a string and a list of queries to calculate the frequency of substrings that begin and end with the same character.

  • The method calculates the length of the input string and initializes a 2D array prefixSumOfCharFreq to store the prefix sums of character frequencies for each character from 'a' to 'z'.
  • First, the character frequencies are populated by iterating through each character of the string and using the ASCII values to place the frequency count accurately in the array.
  • The program then computes the prefix sums for the character frequencies, making it efficient to query the frequency of any character over any substring.
  • For each query, which includes start and end indices, the algorithm computes the frequency of every character between these indices by using the prefix sums.
  • Using the difference between prefix sums at the end and start indices, the total frequency of each character is calculated for the given range.
  • The total substrings with the same start and end characters are then calculated for each character. The logic uses the combinatorial identity for choosing two out of n items (n+1 choose 2).
  • This count is summed up to get the result for each query.

Finally, the results array is returned, which contains the total count of substrings fulfilling the criteria mentioned above for each query. This approach efficiently handles the substring queries by converting them into a frequency counting and prefix sum problem.

  • Python
python
class Solution:
    def countSubstringsWithSameEnds(
        self, string: str, queries: List[List[int]]
    ) -> List[int]:
        length = len(string)
        # Prefix sum array for 'a' to 'z' character frequencies
        prefix_sum = [[0] * length for _ in range(26)]
    
        # Initialize the character frequency array
        for idx, character in enumerate(string):
            prefix_sum[ord(character) - ord('a')][idx] += 1
    
        # Build prefix sum from the frequency data
        for character_prefix in prefix_sum:
            for j in range(1, length):
                character_prefix[j] += character_prefix[j - 1]
    
        answer = []
    
        # Evaluate each query
        for start, end in queries:
            total_substrings = 0
    
            # Compute the number of same end-letter substrings
            for char_prefix in prefix_sum:
                if start == 0:
                    left_freq = 0
                else:
                    left_freq = char_prefix[start - 1]
                right_freq = char_prefix[end]
                freq_in_range = right_freq - left_freq
    
                # Count possible substrings with the same end
                total_substrings += (
                    freq_in_range * (freq_in_range + 1) // 2
                )
    
            answer.append(total_substrings)
    
        return answer

This Python solution provides a method to count substrings within specific ranges where the substrings start and end with the same character. The class Solution includes a method countSubstringsWithSameEnds which takes a string and a list of queries as arguments, with each query specifying a [start, end] range in the string. The method returns the count of such substrings as a list of integers for each query.

Here's a breakdown of the approach:

  • Prefix Sum Preparation: First, a 2D prefix_sum array is initialized where each row corresponds to a character ('a' to 'z') and each column corresponds to indices in the string. This array is populated to reflect the cumulative frequency of each character up to each index in the string.

  • Accuracy in Queries: For each query represented by start and end, the method calculates the frequency of each character within this range using the prefix_sum data. This is done by calculating the difference between the prefix sums at end and start-1.

  • Counting Valid Substrings: For each character present in the queried range, the number of ways to select substrings that start and end with this character is calculated using the formula for combinations (n * (n + 1) // 2, where n is the number of occurrences of that character). The sums of these values over all characters give the total count of valid substrings.

  • Efficiency: The use of prefix_sum arrays reduces the time complexity significantly as it avoids recalculating the frequency for each query, instead deriving it from precomputed values. This makes the solution efficient even for large input sizes and multiple queries.

This method effectively and efficiently computes the number of substrings with the same ending character per each query, leveraging data preprocessing with prefix sums and thoughtful mathematical derivation for substring combinations.

Comments

No comments yet.