Substrings That Begin and End With the Same Letter

Updated on 25 June, 2025
Substrings That Begin and End With the Same Letter header image

Problem Statement

In this problem, you're provided with a string s, which is made up strictly of lowercase English letters. Your task is to determine the total count of substrings that both start and end with the same character.

A substring is a sequence of continuous characters within a string and is always non-empty. You must count all such substrings where the first and last characters are identical.


Examples

Example 1

Input:

s = "abcba"

Output:

7

Explanation:

The substrings of length 1 that start and end with the same letter are: "a", "b", "c", "b", "a".
The substring of length 3: "bcb".
The substring of length 5: "abcba".

Example 2

Input:

s = "abacad"

Output:

9

Explanation:

Substrings of length 1: "a", "b", "a", "c", "a", "d".
Substrings of length 3: "aba", "aca".
Substring of length 5: "abaca".

Example 3

Input:

s = "a"

Output:

1

Explanation:

Only one substring exists: "a".

Constraints

  • 1 <= s.length <= 10^5
  • s consists only of lowercase English letters.

Approach and Intuition

Approach

  1. Use a frequency counter for each character in s.
  2. For each character that appears f times, the number of substrings that start and end with it is f * (f + 1) / 2.
  3. Sum this across all characters to get the final count.

This approach avoids iterating over all possible substrings, making it efficient for large strings.

Intuition

Each time a character appears, it creates new opportunities for substrings that start and end with it. The formula f * (f + 1) / 2 comes from combinatorics — counting how many pairs (i, j) can be formed such that s[i] == s[j], and i <= j.

Solutions

  • C++
  • Java
  • Python
cpp
class Solution {
public:
    long long countSubstrings(string str) {
        long long totalCount = 0;
        vector<long long> charFrequency(26, 0);

        // Increment frequency for each character in the string
        for (char c : str) {
            charFrequency[c - 'a']++;
        }

        // Calculate total substrings that can be formed
        for (long long freq : charFrequency) {
            // Combinatorial calculation of substrings for each character frequency
            totalCount += (freq * (freq + 1)) / 2;
        }

        return totalCount;
    }
};

The provided C++ solution counts the total number of substrings within a string where the substring starts and ends with the same character. The implementation uses a combinatorial approach, exploiting the fact that each group of identical characters contributes to a specific number of palindromic substrings.

Here's the breakdown of the operations performed in the solution:

  • A 26-element vector named charFrequency is initialized to zero, which is used to store the frequency of each letter (assuming only lowercase English alphabets).

  • The code iterates through each character of the input string, updating the frequency of each character in the charFrequency array.

  • For each character frequency in charFrequency, the calculation (freq * (freq + 1)) / 2 adds up the possible substrings where characters are the same at the beginning and the end. This calculation derives from the combinatorial principle that if a character appears freq times, the number of ways you can pick two positions to start and end a substring is given by the formula for combinations C(n+1, 2), which simplifies to (n * (n + 1)) / 2.

Through these steps, the function countSubstrings ultimately returns the total count of substrings meeting the criteria. The use of vector for character counts and the direct formula application allows the function to operate efficiently with a time complexity dependent primarily on the length of the string and linear with respect to the alphabet size.

java
class Solution {

    public long countSubstringsWithAllUniqueChars(String inputStr) {
        long totalCount = 0;
        long[] charCounts = new long[26];

        // Iterate over each character to populate frequency array
        for (char c : inputStr.toCharArray()) {
            charCounts[c - 'a']++;
        }

        // Sum up potential substrings for each character
        for (long freq : charCounts) {
            totalCount += (freq * (freq + 1)) / 2;
        }

        return totalCount;
    }
}

The provided Java solution aims to solve the problem of finding the number of substrings within a given string where the substrings start and end with the same letter. Here’s a concise breakdown of how the code accomplishes this:

  • The solution defines a countSubstringsWithAllUniqueChars method within a Solution class to count such substrings in a given inputStr.

  • Instantiate totalCount to keep the count of compliant substrings, which starts at 0.

  • Use a frequency array charCounts sized for 26 characters to track how many times each character from 'a' to 'z' appears in inputStr.

  • Convert each character in the string to its respective position in the charCounts array and increment its value accordingly. This captures the total occurrence of each character.

  • Calculate potential substrings for each character. If a character appears 'n' times, it can potentially create (n * (n + 1)) / 2 substrings where the substrings begin and end with that character. Add these possible substrings to totalCount.

  • Finally, return totalCount to provide the number of substrings that meet the condition.

This approach primarily leverages mathematical combinations to efficiently count potential substrings without explicitly generating them, illustrating an optimal solution for large input strings due to its linear time complexity related to the size of the input.

python
class Solution:
    def countValidSubstrings(self, input_str: str) -> int:
        result = 0
        char_frequency = [0] * 26

        # Update frequency of each character
        for char in input_str:
            char_frequency[ord(char) - ord('a')] += 1

        # Compute total number of substrings that are valid
        for freq in char_frequency:
            result += (freq * (freq + 1)) // 2

        return result

The given Python code aims to solve the problem of counting substrings where the first and last character are the same. This task is achieved by utilizing a frequency array and combinatorial mathematics.

The code creates a method countValidSubstrings that takes a string input_str as input and initializes a count result to 0, and an array char_frequency sized 26 (assuming the input string consists only of lowercase English letters) to keep track of the frequency of each character.

The process can be divided into two main steps:

  1. Iterate over each character in input_str. For each character, update its frequency count in char_frequency by increasing the count index corresponding to the character.
  2. Loop through the char_frequency array to calculate the result. For each non-zero frequency freq, compute the number of possible substrings that start and end with that character using the formula (freq * (freq + 1)) // 2 and add it to result.

Finally, the method returns the total count result, which represents the number of valid substrings where the first and last characters are the same. Using this method, we capitalize on efficient character processing and mathematical computation to derive the result.

Comments

No comments yet.