Distinct Subsequences II

Updated on 23 May, 2025
Distinct Subsequences II header image

Problem Statement

Given a string s, the task is to calculate the number of distinct non-empty subsequences of the string. A subsequence of a string is derived by deleting zero or more characters from the string without reordering the remaining characters. Notably, "ace" is a valid subsequence of "abcde", while "aec" is not because the order of characters is changed. Due to potentially large results, the final count of distinct subsequences should be returned modulo (10^9 + 7).

Examples

Example 1

Input:

s = "abc"

Output:

7

Explanation:

The 7 distinct subsequences are "a", "b", "c", "ab", "ac", "bc", and "abc".

Example 2

Input:

s = "aba"

Output:

6

Explanation:

The 6 distinct subsequences are "a", "b", "ab", "aa", "ba", and "aba".

Example 3

Input:

s = "aaa"

Output:

3

Explanation:

The 3 distinct subsequences are "a", "aa" and "aaa".

Constraints

  • 1 <= s.length <= 2000
  • s consists of lowercase English letters.

Approach and Intuition

The task requires us to compute distinct subsequences in a given string. Understanding the nature of subsequences is pivotal. Each character in the string can either be included or excluded from a subsequence, potentially doubling the number of subsequences as we move from one character to the next. However, actual counting gets complex due to the requirement to count only distinct subsequences.

Key Observations and Steps

  1. Each additional character in the string offers a potential to create new subsequences by appending itself to all previously formed subsequences. For instance, with string "ab", the new character 'b' can help form "b", "ab" from the old subsequences "" and "a".

  2. The challenge comes from managing repetitions. If a character repeats in the string, it can lead to counting the same subsequences multiple times. For instance, in "aba", the second 'a' forms subsequences which have been considered with the first 'a'.

  3. To handle repetitions, we can use a mechanism to track the last occurrences of subsequences involving a character:

    • Initializing a list to store counts of subsequences ending with each character can be effective.
    • As each character in the string is processed, update these counts based on the pre-existing counts, adjusting for any repetitions by subtracting subsequences that would be counted more than once.
  4. The role of the modulo operation (modulo (10^9 + 7)) is crucial to manage large numbers, as numbers can grow rapidly with each step in a string, especially close to the upper constraint of string length (2000).

By these steps, we make use of dynamic programming by maintaining a running list of subsequence counts and updating them intelligently as we iterate through the string. This approach leverages the properties of subsequences and characters to achieve the desired count effectively while avoiding the computational infeasibility of generating all possible subsequences explicitly.

Solutions

  • C++
  • Java
cpp
class Solution {
public:
    int countDistinctSubsequences(string str) {
        const int length = str.length();
        const int MODULO = 1000000007;
        
        vector<int> subsequences(length + 1);
        subsequences[0] = 1;
        vector<int> lastSeen(26, -1);
        
        for(int index = 0; index < length; index++){
            int charIndex = str[index] - 'a';
            subsequences[index + 1] = subsequences[index] * 2 % MODULO;
            if(lastSeen[charIndex] != -1) // if not the first occurrence
                subsequences[index + 1] -= subsequences[lastSeen[charIndex]];
            subsequences[index + 1] %= MODULO;
            lastSeen[charIndex] = index;
        }
        subsequences[length]--;
        if(subsequences[length] < 0) subsequences[length] += MODULO;
        return subsequences[length];
    }
};

The provided C++ code offers a solution for calculating the number of distinct subsequences in a given string, without including empty subsequences. The calculation ensures modular arithmetic with a MODULO value of 1000000007 to handle large numbers. The program uses dynamic programming, storing intermediate results in the subsequences vector. Here’s a breakdown of how the solution works:

  • subsequences[i+1] keeps the count of subsequences using the first i characters of the string.
  • lastSeen stores the most recent position of each character in the string to avoid counting duplicate subsequences.

Key steps include:

  1. Initialize the subsequences[0] to 1, to account for the empty subsequence.
  2. As you iterate over each character in the string, double the subsequences count to consider new subsequences ending with the current character.
  3. If a character has appeared before, subtract the count up to its last occurrence to remove duplicates.
  4. Adjust the count with modulo MODULO to handle overflow.
  5. Finally, decrement subsequences[length] to exclude the empty subsequence and adjust for any negatives.

This method effectively manages the complexity by treating each character update within a direct indexed loop, making it efficient in terms of both time and space.

java
class Solution {
    public int countDistinctSubsequences(String inputString) {
        final int MODULO = 1_000_000_007;
        int stringLength = inputString.length();
        int[] dpArray = new int[stringLength + 1];
        dpArray[0] = 1;

        int[] lastPosition = new int[26];
        Arrays.fill(lastPosition, -1);

        for (int i = 0; i < stringLength; ++i) {
            int index = inputString.charAt(i) - 'a';
            dpArray[i+1] = dpArray[i] * 2 % MODULO;
            if (lastPosition[index] >= 0)
                dpArray[i+1] -= dpArray[lastPosition[index]];
            dpArray[i+1] %= MODULO;
            lastPosition[index] = i;
        }

        dpArray[stringLength]--;
        if (dpArray[stringLength] < 0) dpArray[stringLength] += MODULO;
        return dpArray[stringLength];
    }
}

The Java solution provided counts the distinct subsequences in a given string using dynamic programming and modulo arithmetic to handle large numbers. The implementation involves these key steps:

  1. Define the modulo constant to prevent overflow.
  2. Initialize a dpArray where each element dpArray[i] will store the count of distinct subsequences up to the first i characters of the input string.
  3. Set the initial condition with dpArray[0] = 1 to account for the empty subsequence.
  4. Use a lastPosition array to keep track of the last position of each character in the alphabet encountered in the input string.
  5. Loop through each character of the input string:
    • Calculate the current character index relative to 'a'.
    • Doubly the value of previous subsequences, ensuring modulo constraints to avoid overflow.
    • If the character has been seen before, subtract the count of subsequences up to the last position of that character to avoid counting duplicates, then ensure the result is modulated.
    • Update this character's last position with the current index.
  6. Decrement the result stored in dpArray[stringLength] by one to exclude the empty subsequence.
  7. Ensure the final count of subsequences is non-negative by adjusting with the modulo if necessary.

This method efficiently calculates the number of unique subsequences, leveraging modular arithmetic to maintain manageable number magnitudes, and is an adaptive use of dynamic programming for combinatorial challenges.

Comments

No comments yet.